2024-08-13 14:40:41,651 INFO [train_multi_KD3.py:1187] (3/4) Training started 2024-08-13 14:40:41,651 INFO [train_multi_KD3.py:1197] (3/4) Device: cuda:3 2024-08-13 14:40:41,653 INFO [train_multi_KD3.py:1212] (3/4) Using dtype=torch.bfloat16 2024-08-13 14:40:41,653 INFO [train_multi_KD3.py:1214] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 16, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-13 14:40:41,653 INFO [train_multi_KD3.py:1216] (3/4) About to create model 2024-08-13 14:40:42,097 INFO [model_shift.py:142] (3/4) Delta_t: 6 when computing the distillation loss 2024-08-13 14:40:42,101 INFO [train_multi_KD3.py:1220] (3/4) Number of model parameters: 66484678 2024-08-13 14:40:42,102 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-15.pt 2024-08-13 14:40:44,461 INFO [train_multi_KD3.py:1235] (3/4) Using DDP 2024-08-13 14:40:46,722 INFO [train_multi_KD3.py:1247] (3/4) Loading optimizer state dict 2024-08-13 14:40:47,145 INFO [train_multi_KD3.py:1255] (3/4) Loading scheduler state dict 2024-08-13 14:40:47,145 INFO [kd_datamodule.py:690] (3/4) About to get train 960 cuts 2024-08-13 14:40:47,194 INFO [train_multi_KD3.py:1306] (3/4) Getting audioset cuts 2024-08-13 14:40:47,194 INFO [kd_datamodule.py:900] (3/4) About to get the audioset cuts for KD. 2024-08-13 14:40:47,196 INFO [kd_datamodule.py:869] (3/4) About to get the voxceleb cuts. 2024-08-13 14:40:47,197 INFO [kd_datamodule.py:880] (3/4) Adding voxceleb2 cuts. 2024-08-13 14:40:47,198 INFO [train_multi_KD3.py:1320] (3/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-13 14:40:55,498 INFO [train_multi_KD3.py:1322] (3/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-13 14:40:55,498 INFO [train_multi_KD3.py:1323] (3/4) Using weights: [1406195, 1904746, 1187704] 2024-08-13 14:40:55,498 INFO [train_multi_KD3.py:1332] (3/4) CutSet(len=4498645) [underlying data type: ] 2024-08-13 14:40:55,498 INFO [kd_datamodule.py:449] (3/4) Disable MUSAN 2024-08-13 14:40:55,498 INFO [kd_datamodule.py:489] (3/4) Disable SpecAugment 2024-08-13 14:40:55,499 INFO [kd_datamodule.py:491] (3/4) About to create train dataset 2024-08-13 14:40:55,502 INFO [kd_datamodule.py:528] (3/4) Using SimpleCutSampler 2024-08-13 14:40:55,502 INFO [kd_datamodule.py:536] (3/4) About to create train dataloader 2024-08-13 14:40:55,504 INFO [kd_datamodule.py:763] (3/4) About to get dev-clean cuts 2024-08-13 14:40:55,505 INFO [kd_datamodule.py:781] (3/4) About to get dev-other cuts 2024-08-13 14:40:55,506 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-13 14:40:55,786 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-13 14:40:55,786 INFO [kd_datamodule.py:840] (3/4) About to get the test set of voxceleb1 set. 2024-08-13 14:40:55,790 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-13 14:40:56,035 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-13 14:40:56,035 INFO [kd_datamodule.py:912] (3/4) About to get the audioset eval cuts. 2024-08-13 14:40:56,042 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-13 14:40:56,523 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-13 14:40:56,523 INFO [train_multi_KD3.py:1412] (3/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-13 14:40:56,523 INFO [train_multi_KD3.py:1416] (3/4) Loading grad scaler state dict 2024-08-13 14:41:09,051 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 14:41:13,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 0, loss[loss=0.08975, beats_loss=0.01029, ecapa_loss=0.0001518, whisper_loss=0.07794, over 17020.00 frames. ], tot_loss[loss=0.08975, beats_loss=0.01029, ecapa_loss=0.0001518, whisper_loss=0.07794, over 17020.00 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:41:13,685 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 14:41:44,628 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005685, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 14:41:50,407 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5469, 2.6882, 2.8312, 2.6021], device='cuda:3') 2024-08-13 14:41:58,052 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.004519, beats_loss=0, ecapa_loss=0.0004519, whisper_loss=0, over 939242.00 frames. 2024-08-13 14:42:19,228 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2242, 2.4172, 2.4297, 2.3601], device='cuda:3') 2024-08-13 14:42:58,999 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7167, 1.9196, 1.6954, 1.3133, 1.5011, 1.4796, 1.8996, 1.6972], device='cuda:3') 2024-08-13 14:43:30,554 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02374, beats_loss=0.02374, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 14:43:30,556 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 14:43:32,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2173810.0, ans=0.125 2024-08-13 14:43:33,922 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-13 14:43:46,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-13 14:44:01,616 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 14:45:10,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2174110.0, ans=0.125 2024-08-13 14:45:30,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2174210.0, ans=0.125 2024-08-13 14:45:41,024 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 14:45:59,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 50, loss[loss=0.1118, beats_loss=0.008191, ecapa_loss=0.000218, whisper_loss=0.1014, over 16294.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01015, ecapa_loss=0.0001637, whisper_loss=0.09082, over 867073.96 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:46:08,392 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 14:46:25,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2174310.0, ans=0.125 2024-08-13 14:46:38,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.896e+01 3.246e+01 4.521e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 14:46:55,578 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 14:48:09,838 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 14:48:47,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2174710.0, ans=0.125 2024-08-13 14:49:14,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 100, loss[loss=0.112, beats_loss=0.007638, ecapa_loss=0.0001625, whisper_loss=0.1027, over 15446.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.009818, ecapa_loss=0.0001653, whisper_loss=0.09228, over 1528892.85 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:50:09,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2174910.0, ans=0.125 2024-08-13 14:50:32,189 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 14:50:35,113 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 14:50:45,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2175010.0, ans=0.125 2024-08-13 14:51:54,110 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 14:52:20,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2175210.0, ans=0.125 2024-08-13 14:52:23,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2175210.0, ans=0.125 2024-08-13 14:52:28,387 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 14:52:29,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2024-08-13 14:52:32,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 150, loss[loss=0.08827, beats_loss=0.01176, ecapa_loss=0.0001577, whisper_loss=0.07494, over 19042.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009903, ecapa_loss=0.0001641, whisper_loss=0.09035, over 2030626.39 frames. ], batch size: 75, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:52:37,955 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 14:52:44,887 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 14:52:49,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2175310.0, ans=0.0 2024-08-13 14:53:06,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.627e+01 2.921e+01 3.180e+01 8.449e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-13 14:53:18,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2175410.0, ans=0.1 2024-08-13 14:53:26,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-08-13 14:53:44,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2175510.0, ans=0.0 2024-08-13 14:53:53,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-13 14:54:30,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2175610.0, ans=0.0 2024-08-13 14:54:33,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2175610.0, ans=0.125 2024-08-13 14:54:57,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2175710.0, ans=0.125 2024-08-13 14:54:59,619 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 14:54:59,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2175710.0, ans=0.125 2024-08-13 14:55:05,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 200, loss[loss=0.09486, beats_loss=0.008588, ecapa_loss=0.0001836, whisper_loss=0.08443, over 20916.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01009, ecapa_loss=0.0001655, whisper_loss=0.09043, over 2432430.22 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 14:55:11,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2175810.0, ans=0.125 2024-08-13 14:55:17,406 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 14:55:23,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-08-13 14:55:45,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2176010.0, ans=0.125 2024-08-13 14:56:02,451 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 14:56:03,263 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 9 from Vox, 39 fro AS 2024-08-13 14:56:29,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2176210.0, ans=0.1 2024-08-13 14:56:31,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:32,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 250, loss[loss=0.08955, beats_loss=0.0106, ecapa_loss=0.0001714, whisper_loss=0.07723, over 18238.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01024, ecapa_loss=0.000165, whisper_loss=0.09058, over 2703282.60 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:56:35,452 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 14:56:36,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:36,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2176310.0, ans=0.125 2024-08-13 14:56:41,537 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 14:56:45,434 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 14:56:47,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.294e+01 2.573e+01 2.919e+01 5.746e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-13 14:56:58,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2176410.0, ans=0.2 2024-08-13 14:57:16,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2024-08-13 14:57:25,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2176610.0, ans=0.2 2024-08-13 14:57:35,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2176610.0, ans=0.125 2024-08-13 14:57:35,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2176610.0, ans=0.125 2024-08-13 14:57:38,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2176710.0, ans=0.125 2024-08-13 14:57:41,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.69 vs. limit=22.5 2024-08-13 14:57:56,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 300, loss[loss=0.1092, beats_loss=0.0101, ecapa_loss=0.0002182, whisper_loss=0.09687, over 18873.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001663, whisper_loss=0.08997, over 2903570.07 frames. ], batch size: 79, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:58:46,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2177110.0, ans=0.0 2024-08-13 14:58:50,707 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 14:58:56,475 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 14:58:57,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2177110.0, ans=0.125 2024-08-13 14:58:57,722 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 14:59:09,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-08-13 14:59:14,690 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 14:59:15,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 350, loss[loss=0.09244, beats_loss=0.01013, ecapa_loss=0.0001761, whisper_loss=0.08054, over 14530.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001665, whisper_loss=0.08975, over 3098657.78 frames. ], batch size: 59, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 14:59:20,812 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 14:59:29,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-08-13 14:59:31,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.361e+01 2.664e+01 2.951e+01 4.705e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 14:59:32,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2177410.0, ans=0.125 2024-08-13 14:59:36,405 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 11 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 14:59:52,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2177510.0, ans=0.125 2024-08-13 14:59:55,358 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 15:00:11,722 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 15:00:15,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2177610.0, ans=0.2 2024-08-13 15:00:32,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2024-08-13 15:00:32,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 400, loss[loss=0.08352, beats_loss=0.009156, ecapa_loss=0.0001125, whisper_loss=0.07324, over 17937.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001649, whisper_loss=0.08871, over 3239253.25 frames. ], batch size: 63, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:00:39,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-13 15:00:42,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2177810.0, ans=0.04949747468305833 2024-08-13 15:00:46,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2024-08-13 15:01:03,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2178010.0, ans=0.07 2024-08-13 15:01:07,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.53 vs. limit=10.0 2024-08-13 15:01:12,643 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 15:01:14,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2178010.0, ans=0.0 2024-08-13 15:01:15,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-13 15:01:22,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2178110.0, ans=0.04949747468305833 2024-08-13 15:01:28,032 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 15:01:37,490 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 15:01:39,186 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 15:01:45,008 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.446e+01 2024-08-13 15:01:46,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2178310.0, ans=0.125 2024-08-13 15:01:47,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2178310.0, ans=0.1 2024-08-13 15:01:47,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 450, loss[loss=0.1063, beats_loss=0.008441, ecapa_loss=0.0001691, whisper_loss=0.09616, over 16364.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0106, ecapa_loss=0.0001643, whisper_loss=0.08811, over 3353044.81 frames. ], batch size: 62, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:01:48,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2178310.0, ans=0.0 2024-08-13 15:01:51,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2178310.0, ans=0.125 2024-08-13 15:01:54,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2178310.0, ans=0.0 2024-08-13 15:02:02,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.405e+01 2.560e+01 2.967e+01 1.017e+02, threshold=5.120e+01, percent-clipped=1.0 2024-08-13 15:02:11,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2024-08-13 15:02:14,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-08-13 15:02:23,647 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 15:02:34,807 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 15:02:39,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2178610.0, ans=0.125 2024-08-13 15:02:40,248 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 33 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 15:02:54,527 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.209e+05 2024-08-13 15:03:00,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 500, loss[loss=0.08465, beats_loss=0.01142, ecapa_loss=0.0001454, whisper_loss=0.07177, over 16786.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001636, whisper_loss=0.08915, over 3476262.54 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:03:23,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2178910.0, ans=0.2 2024-08-13 15:04:14,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 550, loss[loss=0.1096, beats_loss=0.009818, ecapa_loss=0.0001679, whisper_loss=0.09811, over 22591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001626, whisper_loss=0.08997, over 3555101.59 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:04:15,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2179310.0, ans=0.0 2024-08-13 15:04:15,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2179310.0, ans=0.95 2024-08-13 15:04:20,612 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-13 15:04:29,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.286e+01 2.542e+01 2.908e+01 4.014e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-13 15:04:30,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2024-08-13 15:04:33,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2179410.0, ans=0.0 2024-08-13 15:04:42,391 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 15:04:51,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2179510.0, ans=0.05 2024-08-13 15:04:56,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2179510.0, ans=0.0 2024-08-13 15:04:59,429 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 15:05:05,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2179610.0, ans=0.0 2024-08-13 15:05:06,655 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 15:05:07,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2179610.0, ans=0.125 2024-08-13 15:05:23,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2179710.0, ans=0.125 2024-08-13 15:05:29,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 600, loss[loss=0.1078, beats_loss=0.01031, ecapa_loss=0.0001679, whisper_loss=0.09578, over 16518.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001608, whisper_loss=0.09065, over 3637158.84 frames. ], batch size: 65, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:05:32,088 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 15:05:40,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2179810.0, ans=0.0 2024-08-13 15:05:47,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2024-08-13 15:05:51,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2179910.0, ans=0.125 2024-08-13 15:05:55,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2179910.0, ans=0.1 2024-08-13 15:06:11,000 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 15:06:12,992 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 15:06:30,857 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 15:06:40,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 650, loss[loss=0.1067, beats_loss=0.01293, ecapa_loss=0.0001632, whisper_loss=0.09212, over 22208.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001597, whisper_loss=0.09038, over 3678533.95 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:06:44,310 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 15:06:47,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2180310.0, ans=0.125 2024-08-13 15:06:55,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.462e+01 2.734e+01 3.167e+01 1.676e+02, threshold=5.468e+01, percent-clipped=3.0 2024-08-13 15:07:02,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2180410.0, ans=0.125 2024-08-13 15:07:13,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2180510.0, ans=0.0 2024-08-13 15:07:43,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 15:07:53,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2180810.0, ans=0.2 2024-08-13 15:07:54,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 700, loss[loss=0.07899, beats_loss=0.01141, ecapa_loss=0.0001546, whisper_loss=0.06603, over 18228.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001609, whisper_loss=0.09066, over 3727418.08 frames. ], batch size: 72, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:08:05,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2180810.0, ans=0.0 2024-08-13 15:08:22,834 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 15:08:24,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2181010.0, ans=0.0 2024-08-13 15:08:48,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2181110.0, ans=0.1 2024-08-13 15:08:53,631 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.035e-03 2024-08-13 15:09:06,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 750, loss[loss=0.1093, beats_loss=0.01279, ecapa_loss=0.0001343, whisper_loss=0.09517, over 19460.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001594, whisper_loss=0.09002, over 3763218.09 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:09:11,506 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 15:09:13,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2181310.0, ans=0.125 2024-08-13 15:09:14,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=22.5 2024-08-13 15:09:16,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2024-08-13 15:09:17,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2181310.0, ans=0.0 2024-08-13 15:09:21,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.329e+01 2.542e+01 2.977e+01 1.085e+02, threshold=5.083e+01, percent-clipped=1.0 2024-08-13 15:09:26,030 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.532e+00 2024-08-13 15:09:41,948 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 15:09:45,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2181510.0, ans=0.125 2024-08-13 15:09:56,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2181610.0, ans=0.125 2024-08-13 15:09:57,571 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 15:10:03,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2181710.0, ans=0.2 2024-08-13 15:10:18,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-08-13 15:10:18,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 800, loss[loss=0.1074, beats_loss=0.009977, ecapa_loss=0.0001513, whisper_loss=0.09592, over 18521.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001599, whisper_loss=0.08989, over 3760919.67 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:10:23,282 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-13 15:10:39,395 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 15:10:55,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2182010.0, ans=0.0 2024-08-13 15:11:08,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-13 15:11:16,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2024-08-13 15:11:19,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-13 15:11:28,619 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 15:11:32,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 850, loss[loss=0.08229, beats_loss=0.01076, ecapa_loss=0.0001729, whisper_loss=0.06981, over 17035.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001599, whisper_loss=0.09, over 3767285.74 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:11:41,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-13 15:11:45,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2182410.0, ans=0.0 2024-08-13 15:11:46,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.398e+01 2.663e+01 2.990e+01 7.176e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-13 15:11:51,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2182410.0, ans=0.0 2024-08-13 15:11:52,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2182410.0, ans=0.125 2024-08-13 15:11:53,660 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 15:12:15,606 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 15:12:17,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2182610.0, ans=0.125 2024-08-13 15:12:30,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2182710.0, ans=0.025 2024-08-13 15:12:41,510 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 15:12:44,511 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 900, loss[loss=0.09637, beats_loss=0.01291, ecapa_loss=0.0001534, whisper_loss=0.08193, over 23304.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001604, whisper_loss=0.09015, over 3764552.12 frames. ], batch size: 96, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:12:53,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2182810.0, ans=0.1 2024-08-13 15:13:04,887 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 25 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-13 15:13:19,711 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 15:13:21,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2183010.0, ans=0.1 2024-08-13 15:13:23,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2183010.0, ans=0.2 2024-08-13 15:13:33,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=15.0 2024-08-13 15:13:40,676 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 15:13:51,578 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 15:13:52,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2183210.0, ans=0.2 2024-08-13 15:13:57,665 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 15:13:59,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 950, loss[loss=0.1165, beats_loss=0.009959, ecapa_loss=0.0001564, whisper_loss=0.105, over 23130.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001597, whisper_loss=0.0896, over 3770499.40 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:14:08,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2183310.0, ans=0.125 2024-08-13 15:14:13,557 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.387e+01 2.716e+01 2.954e+01 4.081e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 15:14:24,121 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 15:14:24,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2183410.0, ans=0.0 2024-08-13 15:14:28,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-13 15:14:35,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2183510.0, ans=0.0 2024-08-13 15:14:35,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2024-08-13 15:14:38,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2183510.0, ans=0.125 2024-08-13 15:14:53,896 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.721e-03 2024-08-13 15:14:59,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2183710.0, ans=0.0 2024-08-13 15:15:00,517 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 15:15:14,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1000, loss[loss=0.1112, beats_loss=0.008892, ecapa_loss=0.0001558, whisper_loss=0.1007, over 16281.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001578, whisper_loss=0.08924, over 3791908.57 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:15:18,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-13 15:15:23,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2183810.0, ans=0.5 2024-08-13 15:15:25,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2183810.0, ans=0.125 2024-08-13 15:15:37,477 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-13 15:16:09,150 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 15:16:27,828 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 15:16:29,315 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1050, loss[loss=0.1159, beats_loss=0.009174, ecapa_loss=0.0001712, whisper_loss=0.105, over 15805.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.08937, over 3812016.76 frames. ], batch size: 57, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:16:29,463 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-13 15:16:37,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2184310.0, ans=0.125 2024-08-13 15:16:42,758 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 15:16:43,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.435e+01 2.686e+01 3.027e+01 6.105e+01, threshold=5.372e+01, percent-clipped=2.0 2024-08-13 15:16:47,126 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 15:16:54,269 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 15:17:17,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2184610.0, ans=0.125 2024-08-13 15:17:26,524 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 15:17:29,177 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 15:17:31,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2184710.0, ans=0.1 2024-08-13 15:17:31,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2184710.0, ans=0.125 2024-08-13 15:17:41,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2184710.0, ans=0.0 2024-08-13 15:17:43,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1100, loss[loss=0.11, beats_loss=0.01002, ecapa_loss=0.0001426, whisper_loss=0.09855, over 23526.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0107, ecapa_loss=0.0001576, whisper_loss=0.08987, over 3836239.64 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:17:56,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2184810.0, ans=0.5 2024-08-13 15:18:05,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2184910.0, ans=0.125 2024-08-13 15:18:13,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2184910.0, ans=0.125 2024-08-13 15:18:32,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2185110.0, ans=0.1 2024-08-13 15:18:53,048 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 15:19:00,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1150, loss[loss=0.09039, beats_loss=0.008107, ecapa_loss=0.0001883, whisper_loss=0.0804, over 15064.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.0001585, whisper_loss=0.08965, over 3848262.41 frames. ], batch size: 60, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:19:09,054 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 15:19:16,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.495e+01 2.743e+01 3.086e+01 4.866e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 15:19:45,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-13 15:20:27,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2185710.0, ans=0.025 2024-08-13 15:20:35,859 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 15:20:45,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1200, loss[loss=0.08874, beats_loss=0.0114, ecapa_loss=0.0001625, whisper_loss=0.07572, over 20259.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001584, whisper_loss=0.08987, over 3830473.26 frames. ], batch size: 82, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:20:49,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2185810.0, ans=0.125 2024-08-13 15:20:53,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2185810.0, ans=0.0 2024-08-13 15:20:55,393 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.213e+01 2024-08-13 15:21:12,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-13 15:21:19,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2186010.0, ans=0.1 2024-08-13 15:21:22,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2186010.0, ans=0.125 2024-08-13 15:21:24,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2186010.0, ans=0.0 2024-08-13 15:21:28,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2186010.0, ans=0.125 2024-08-13 15:21:33,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2186010.0, ans=15.0 2024-08-13 15:21:34,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2186110.0, ans=0.125 2024-08-13 15:21:40,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2186110.0, ans=0.125 2024-08-13 15:21:58,015 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 15:22:04,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2186210.0, ans=0.09899494936611666 2024-08-13 15:22:07,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1250, loss[loss=0.1031, beats_loss=0.009562, ecapa_loss=0.0001394, whisper_loss=0.09211, over 19967.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001583, whisper_loss=0.09003, over 3815279.97 frames. ], batch size: 75, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:22:15,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2186310.0, ans=0.125 2024-08-13 15:22:22,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.204e+01 2.472e+01 2.749e+01 3.995e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-13 15:22:33,698 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 15:23:25,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1300, loss[loss=0.09313, beats_loss=0.01108, ecapa_loss=0.0001381, whisper_loss=0.08067, over 22554.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001581, whisper_loss=0.09019, over 3826681.81 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:23:31,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-08-13 15:23:42,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-08-13 15:23:43,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2186910.0, ans=0.125 2024-08-13 15:23:57,662 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 15:23:58,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2187010.0, ans=0.125 2024-08-13 15:23:58,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2187010.0, ans=0.125 2024-08-13 15:24:06,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2187010.0, ans=10.0 2024-08-13 15:24:19,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2187110.0, ans=0.2 2024-08-13 15:24:41,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1350, loss[loss=0.1134, beats_loss=0.008463, ecapa_loss=0.0001963, whisper_loss=0.103, over 16619.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.000158, whisper_loss=0.09019, over 3843509.04 frames. ], batch size: 66, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 15:24:48,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.81 vs. limit=6.0 2024-08-13 15:24:52,271 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 15:25:00,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.385e+01 2.728e+01 3.101e+01 1.009e+02, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:25:07,475 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 15:25:09,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2187410.0, ans=0.0 2024-08-13 15:25:13,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2187510.0, ans=0.0 2024-08-13 15:25:14,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-08-13 15:25:16,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2187510.0, ans=0.0 2024-08-13 15:25:29,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2187610.0, ans=0.125 2024-08-13 15:25:32,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2187610.0, ans=0.0 2024-08-13 15:25:44,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2187710.0, ans=0.125 2024-08-13 15:25:52,594 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 15:25:57,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1400, loss[loss=0.1045, beats_loss=0.01008, ecapa_loss=0.0001336, whisper_loss=0.09313, over 15432.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001585, whisper_loss=0.09074, over 3820194.34 frames. ], batch size: 58, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:26:09,432 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 15:26:10,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2187810.0, ans=0.125 2024-08-13 15:26:15,318 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 15:26:36,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2024-08-13 15:26:48,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2188110.0, ans=0.0 2024-08-13 15:26:59,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188210.0, ans=0.1 2024-08-13 15:27:00,541 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 15:27:04,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-08-13 15:27:23,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1450, loss[loss=0.08984, beats_loss=0.009198, ecapa_loss=0.000166, whisper_loss=0.07898, over 13147.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001586, whisper_loss=0.09067, over 3790558.71 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:27:27,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.38 vs. limit=22.5 2024-08-13 15:27:28,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=12.0 2024-08-13 15:27:30,187 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 15:27:35,046 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 15:27:37,023 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 15:27:40,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.325e+01 2.552e+01 2.880e+01 5.017e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-13 15:27:41,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2188410.0, ans=0.0 2024-08-13 15:27:41,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2188410.0, ans=0.125 2024-08-13 15:27:43,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188410.0, ans=0.1 2024-08-13 15:27:45,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2188410.0, ans=0.0 2024-08-13 15:27:46,135 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 15:27:55,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2188510.0, ans=10.0 2024-08-13 15:27:55,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2188510.0, ans=0.1 2024-08-13 15:28:04,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2188510.0, ans=0.125 2024-08-13 15:28:43,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1500, loss[loss=0.09521, beats_loss=0.01069, ecapa_loss=0.000174, whisper_loss=0.08278, over 20986.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001576, whisper_loss=0.09003, over 3828318.01 frames. ], batch size: 86, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:29:21,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2189010.0, ans=0.125 2024-08-13 15:29:43,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2189110.0, ans=0.0 2024-08-13 15:29:43,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-13 15:29:44,430 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 15:29:57,857 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 15:30:01,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2189210.0, ans=0.125 2024-08-13 15:30:04,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1550, loss[loss=0.08295, beats_loss=0.009579, ecapa_loss=0.0001387, whisper_loss=0.07198, over 13984.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001564, whisper_loss=0.0904, over 3814516.88 frames. ], batch size: 55, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:30:10,016 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 15:30:21,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2189410.0, ans=0.1 2024-08-13 15:30:23,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.248e+01 2.490e+01 2.864e+01 4.046e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-13 15:30:29,168 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 15:30:41,473 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 15:30:45,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-13 15:30:45,778 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 15:31:05,921 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 15:31:08,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2189610.0, ans=0.05 2024-08-13 15:31:24,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2189710.0, ans=0.125 2024-08-13 15:31:26,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1600, loss[loss=0.1005, beats_loss=0.01229, ecapa_loss=0.0001562, whisper_loss=0.08662, over 15670.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001566, whisper_loss=0.09083, over 3832699.63 frames. ], batch size: 63, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:31:47,583 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-13 15:31:48,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2189910.0, ans=0.125 2024-08-13 15:32:41,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-13 15:32:43,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2024-08-13 15:32:46,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1650, loss[loss=0.1042, beats_loss=0.01341, ecapa_loss=0.0001229, whisper_loss=0.08953, over 15771.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01066, ecapa_loss=0.0001563, whisper_loss=0.09066, over 3826863.47 frames. ], batch size: 60, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:32:48,352 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 15:32:52,988 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 15:32:54,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2024-08-13 15:33:03,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.370e+01 2.654e+01 3.120e+01 7.882e+01, threshold=5.308e+01, percent-clipped=3.0 2024-08-13 15:33:17,864 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.553e+01 2024-08-13 15:33:20,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2190510.0, ans=0.1 2024-08-13 15:33:27,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2190510.0, ans=0.125 2024-08-13 15:33:43,700 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 15:33:56,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2190710.0, ans=0.125 2024-08-13 15:34:04,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2190810.0, ans=0.125 2024-08-13 15:34:04,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1700, loss[loss=0.1001, beats_loss=0.008272, ecapa_loss=0.0001309, whisper_loss=0.09056, over 15221.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001571, whisper_loss=0.09088, over 3844388.21 frames. ], batch size: 54, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:34:07,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2190810.0, ans=0.0 2024-08-13 15:34:08,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2190810.0, ans=0.015 2024-08-13 15:34:10,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2190810.0, ans=10.0 2024-08-13 15:34:18,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2190810.0, ans=0.5 2024-08-13 15:34:28,813 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 15:34:35,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2191010.0, ans=0.125 2024-08-13 15:34:40,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2191010.0, ans=0.05 2024-08-13 15:34:41,581 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 15:34:46,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2191010.0, ans=15.0 2024-08-13 15:34:56,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2191110.0, ans=0.125 2024-08-13 15:34:59,309 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 15:35:01,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2191110.0, ans=0.2 2024-08-13 15:35:19,089 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 15:35:19,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2191210.0, ans=0.025 2024-08-13 15:35:19,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-13 15:35:19,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2024-08-13 15:35:21,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1750, loss[loss=0.1124, beats_loss=0.008087, ecapa_loss=0.0001716, whisper_loss=0.1026, over 15948.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001577, whisper_loss=0.09135, over 3857942.26 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:35:23,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2191310.0, ans=0.125 2024-08-13 15:35:25,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2191310.0, ans=0.125 2024-08-13 15:35:27,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2191310.0, ans=0.035 2024-08-13 15:35:28,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2191310.0, ans=0.125 2024-08-13 15:35:36,261 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 15:35:37,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.448e+01 2.728e+01 3.089e+01 6.360e+01, threshold=5.456e+01, percent-clipped=3.0 2024-08-13 15:36:18,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2191610.0, ans=0.125 2024-08-13 15:36:24,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2191710.0, ans=0.125 2024-08-13 15:36:25,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2191710.0, ans=0.125 2024-08-13 15:36:27,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2191710.0, ans=0.125 2024-08-13 15:36:35,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1800, loss[loss=0.1306, beats_loss=0.01166, ecapa_loss=0.0001223, whisper_loss=0.1177, over 22138.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001583, whisper_loss=0.09116, over 3863460.98 frames. ], batch size: 83, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:36:42,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2191810.0, ans=0.1 2024-08-13 15:37:11,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2192010.0, ans=0.0 2024-08-13 15:37:14,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2192010.0, ans=0.125 2024-08-13 15:37:21,323 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 15:37:50,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1850, loss[loss=0.1278, beats_loss=0.009773, ecapa_loss=0.0001887, whisper_loss=0.1161, over 18407.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.000158, whisper_loss=0.09116, over 3848684.00 frames. ], batch size: 73, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:37:58,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2192310.0, ans=0.0 2024-08-13 15:38:06,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.626e+01 2.890e+01 6.922e+01, threshold=5.252e+01, percent-clipped=1.0 2024-08-13 15:38:08,964 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 15:38:09,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2192410.0, ans=0.125 2024-08-13 15:38:13,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2192410.0, ans=0.035 2024-08-13 15:38:15,802 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 15:38:18,620 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 15:38:20,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2192510.0, ans=0.125 2024-08-13 15:38:28,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2192510.0, ans=0.125 2024-08-13 15:38:37,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2024-08-13 15:38:44,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=12.0 2024-08-13 15:38:45,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2192610.0, ans=0.09899494936611666 2024-08-13 15:39:03,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1900, loss[loss=0.09553, beats_loss=0.0106, ecapa_loss=0.0001755, whisper_loss=0.08318, over 13839.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001576, whisper_loss=0.0902, over 3814605.58 frames. ], batch size: 55, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:39:09,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2192810.0, ans=0.125 2024-08-13 15:39:10,699 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 15:39:27,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2192910.0, ans=0.04949747468305833 2024-08-13 15:39:38,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2193010.0, ans=0.125 2024-08-13 15:39:41,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2193010.0, ans=0.2 2024-08-13 15:39:45,672 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 15:39:49,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2193110.0, ans=0.125 2024-08-13 15:40:03,334 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 15:40:16,857 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 9 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-13 15:40:18,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 1950, loss[loss=0.05541, beats_loss=0.01165, ecapa_loss=0.000167, whisper_loss=0.04209, over 16517.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001585, whisper_loss=0.08906, over 3793511.88 frames. ], batch size: 68, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:40:18,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2193310.0, ans=0.0 2024-08-13 15:40:25,366 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 15:40:29,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-13 15:40:33,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2193410.0, ans=0.1 2024-08-13 15:40:34,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.351e+01 2.582e+01 2.888e+01 8.249e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-13 15:40:41,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2193410.0, ans=0.0 2024-08-13 15:40:41,950 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-13 15:40:47,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2193510.0, ans=0.1 2024-08-13 15:40:52,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.19 vs. limit=10.0 2024-08-13 15:41:01,505 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 15:41:04,663 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 15:41:07,667 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 15:41:21,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2193710.0, ans=0.1 2024-08-13 15:41:33,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2000, loss[loss=0.1186, beats_loss=0.009412, ecapa_loss=0.000198, whisper_loss=0.1073, over 18644.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001584, whisper_loss=0.08981, over 3804655.77 frames. ], batch size: 75, lr: 3.97e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:41:36,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2024-08-13 15:42:04,111 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 15:42:07,554 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 15:42:14,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2194010.0, ans=0.0 2024-08-13 15:42:20,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-13 15:42:25,618 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 15:42:26,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2194110.0, ans=0.0 2024-08-13 15:42:43,150 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 15:42:47,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2050, loss[loss=0.0921, beats_loss=0.01184, ecapa_loss=0.00018, whisper_loss=0.07846, over 20883.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01073, ecapa_loss=0.0001591, whisper_loss=0.08947, over 3795424.89 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:42:52,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2194310.0, ans=0.0 2024-08-13 15:42:56,375 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 15:42:57,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2194310.0, ans=0.125 2024-08-13 15:43:03,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2194410.0, ans=0.125 2024-08-13 15:43:03,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.358e+01 2.622e+01 3.012e+01 4.492e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-13 15:43:04,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2194410.0, ans=0.2 2024-08-13 15:43:08,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2194410.0, ans=0.2 2024-08-13 15:43:09,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2194410.0, ans=0.1 2024-08-13 15:43:12,287 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 15:43:13,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2194410.0, ans=0.125 2024-08-13 15:43:23,689 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 15:43:39,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2194610.0, ans=0.025 2024-08-13 15:43:42,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2194610.0, ans=0.125 2024-08-13 15:43:46,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2194710.0, ans=0.125 2024-08-13 15:43:52,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2024-08-13 15:44:02,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2100, loss[loss=0.07131, beats_loss=0.01355, ecapa_loss=0.0001442, whisper_loss=0.05632, over 22312.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01084, ecapa_loss=0.0001575, whisper_loss=0.08944, over 3803103.78 frames. ], batch size: 94, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:44:02,639 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.740e+01 2024-08-13 15:44:03,449 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 15:44:08,950 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 15:44:18,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2194910.0, ans=0.125 2024-08-13 15:44:38,460 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 15:44:39,997 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 15:44:45,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2195110.0, ans=0.5 2024-08-13 15:44:55,799 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 15:45:14,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2150, loss[loss=0.09565, beats_loss=0.01352, ecapa_loss=0.0001599, whisper_loss=0.08054, over 17213.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01081, ecapa_loss=0.0001574, whisper_loss=0.08954, over 3810221.10 frames. ], batch size: 73, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:45:21,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2195310.0, ans=15.0 2024-08-13 15:45:23,489 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 15:45:30,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.427e+01 2.711e+01 3.071e+01 5.101e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-13 15:45:57,809 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 15:46:01,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2195610.0, ans=0.05 2024-08-13 15:46:04,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2195610.0, ans=0.1 2024-08-13 15:46:10,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-13 15:46:11,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2195610.0, ans=0.05 2024-08-13 15:46:29,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2200, loss[loss=0.09997, beats_loss=0.009906, ecapa_loss=0.0001482, whisper_loss=0.08858, over 21617.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001581, whisper_loss=0.08987, over 3792731.94 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:46:44,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2024-08-13 15:46:46,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2195910.0, ans=0.5 2024-08-13 15:46:58,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2195910.0, ans=0.125 2024-08-13 15:47:22,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2196110.0, ans=0.0 2024-08-13 15:47:29,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2196210.0, ans=0.125 2024-08-13 15:47:45,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2250, loss[loss=0.08892, beats_loss=0.01205, ecapa_loss=0.000125, whisper_loss=0.07562, over 14646.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.000157, whisper_loss=0.09015, over 3812735.31 frames. ], batch size: 54, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:47:53,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-08-13 15:48:01,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2196410.0, ans=0.125 2024-08-13 15:48:01,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.332e+01 2.611e+01 2.967e+01 5.729e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 15:48:14,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2196510.0, ans=0.125 2024-08-13 15:48:23,175 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 15:48:35,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2024-08-13 15:48:37,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2196610.0, ans=0.125 2024-08-13 15:48:48,729 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-13 15:48:56,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2196710.0, ans=0.95 2024-08-13 15:49:00,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2300, loss[loss=0.1351, beats_loss=0.008924, ecapa_loss=0.0001534, whisper_loss=0.1246, over 14390.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001579, whisper_loss=0.09084, over 3870556.62 frames. ], batch size: 54, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:49:00,658 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 15:49:18,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2196910.0, ans=0.0 2024-08-13 15:49:26,186 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 15:49:26,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-08-13 15:49:36,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2197010.0, ans=0.125 2024-08-13 15:49:46,696 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 15:50:13,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-08-13 15:50:14,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2350, loss[loss=0.1013, beats_loss=0.01071, ecapa_loss=0.0001906, whisper_loss=0.08868, over 15684.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001601, whisper_loss=0.09055, over 3858652.11 frames. ], batch size: 63, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:50:20,611 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 15:50:29,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2197410.0, ans=0.125 2024-08-13 15:50:31,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.477e+01 2.777e+01 3.066e+01 6.337e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-13 15:50:31,826 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 15:50:36,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-13 15:50:38,213 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 15:50:52,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2197510.0, ans=0.2 2024-08-13 15:50:54,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2197510.0, ans=0.0 2024-08-13 15:50:57,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2197510.0, ans=0.125 2024-08-13 15:51:02,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2197610.0, ans=0.1 2024-08-13 15:51:05,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2197610.0, ans=0.125 2024-08-13 15:51:14,078 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 15:51:20,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2197710.0, ans=0.125 2024-08-13 15:51:26,153 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 15:51:26,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2024-08-13 15:51:27,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.59 vs. limit=22.5 2024-08-13 15:51:29,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2197810.0, ans=0.125 2024-08-13 15:51:30,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2400, loss[loss=0.07166, beats_loss=0.01333, ecapa_loss=0.0001428, whisper_loss=0.0569, over 21969.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001602, whisper_loss=0.09048, over 3891065.84 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:51:36,161 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 15:51:36,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2197810.0, ans=0.0 2024-08-13 15:51:36,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-13 15:52:27,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=15.0 2024-08-13 15:52:28,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2198210.0, ans=0.125 2024-08-13 15:52:36,236 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 15:52:42,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2450, loss[loss=0.1175, beats_loss=0.007647, ecapa_loss=0.0001985, whisper_loss=0.1079, over 18850.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001609, whisper_loss=0.09046, over 3867098.85 frames. ], batch size: 76, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:52:57,517 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 15:52:58,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.483e+01 2.773e+01 3.111e+01 4.520e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-13 15:53:07,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2198410.0, ans=0.1 2024-08-13 15:53:10,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2198510.0, ans=0.0 2024-08-13 15:53:11,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2198510.0, ans=0.125 2024-08-13 15:53:28,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2198610.0, ans=0.2 2024-08-13 15:53:36,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2198610.0, ans=0.025 2024-08-13 15:53:49,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2198710.0, ans=0.1 2024-08-13 15:53:53,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2500, loss[loss=0.09908, beats_loss=0.00856, ecapa_loss=0.0001202, whisper_loss=0.08932, over 16538.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01069, ecapa_loss=0.0001621, whisper_loss=0.09096, over 3890782.01 frames. ], batch size: 59, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:53:55,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2198810.0, ans=0.0 2024-08-13 15:54:02,381 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 15:54:30,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2199010.0, ans=0.0 2024-08-13 15:54:48,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=22.5 2024-08-13 15:55:01,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2199210.0, ans=0.125 2024-08-13 15:55:03,782 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 15:55:06,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2550, loss[loss=0.08476, beats_loss=0.01152, ecapa_loss=0.0001399, whisper_loss=0.07185, over 21568.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09144, over 3888597.46 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:55:06,807 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 15:55:21,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.351e+01 2.676e+01 3.107e+01 6.569e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 15:55:22,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.57 vs. limit=22.5 2024-08-13 15:55:32,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2199410.0, ans=0.125 2024-08-13 15:55:41,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2199510.0, ans=0.0 2024-08-13 15:56:08,462 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 15:56:16,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-13 15:56:17,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2600, loss[loss=0.1072, beats_loss=0.01092, ecapa_loss=0.0001701, whisper_loss=0.09455, over 21806.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001618, whisper_loss=0.09076, over 3868299.12 frames. ], batch size: 88, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:56:21,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2199810.0, ans=0.125 2024-08-13 15:56:34,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.27 vs. limit=22.5 2024-08-13 15:56:38,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2199910.0, ans=0.1 2024-08-13 15:56:53,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2200010.0, ans=0.125 2024-08-13 15:56:55,445 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 15:57:21,492 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 15:57:23,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2200210.0, ans=0.125 2024-08-13 15:57:31,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=12.0 2024-08-13 15:57:33,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2650, loss[loss=0.08763, beats_loss=0.01217, ecapa_loss=0.0001813, whisper_loss=0.07365, over 20893.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001633, whisper_loss=0.09049, over 3874096.99 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:57:35,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.60 vs. limit=22.5 2024-08-13 15:57:39,314 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 15:57:42,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2200310.0, ans=0.1 2024-08-13 15:57:49,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.280e+01 2.561e+01 2.894e+01 4.049e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-13 15:57:52,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2200410.0, ans=0.0 2024-08-13 15:58:12,585 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 15:58:41,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2200710.0, ans=0.125 2024-08-13 15:58:49,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2200710.0, ans=0.0 2024-08-13 15:58:49,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2200710.0, ans=0.1 2024-08-13 15:58:52,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2700, loss[loss=0.113, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.1011, over 16487.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001626, whisper_loss=0.09102, over 3891576.07 frames. ], batch size: 63, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 15:59:00,844 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 15:59:13,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2200910.0, ans=0.125 2024-08-13 15:59:21,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2200910.0, ans=0.0 2024-08-13 15:59:46,870 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 15:59:47,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2201110.0, ans=0.125 2024-08-13 15:59:50,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-13 16:00:17,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2750, loss[loss=0.1101, beats_loss=0.01106, ecapa_loss=0.0001531, whisper_loss=0.09755, over 21699.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001616, whisper_loss=0.09083, over 3897957.69 frames. ], batch size: 84, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:00:34,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.357e+01 2.643e+01 3.055e+01 4.900e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-13 16:00:35,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2201410.0, ans=0.0 2024-08-13 16:00:53,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2201510.0, ans=0.125 2024-08-13 16:01:16,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2201610.0, ans=0.125 2024-08-13 16:01:24,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2201710.0, ans=0.125 2024-08-13 16:01:33,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2201810.0, ans=0.0 2024-08-13 16:01:33,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2800, loss[loss=0.08479, beats_loss=0.01044, ecapa_loss=0.0002139, whisper_loss=0.07221, over 20141.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001615, whisper_loss=0.09121, over 3876618.81 frames. ], batch size: 88, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:01:39,200 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 16:01:45,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2201810.0, ans=0.1 2024-08-13 16:01:45,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-13 16:01:47,828 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 16:01:59,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2201910.0, ans=0.0 2024-08-13 16:02:01,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2201910.0, ans=0.125 2024-08-13 16:02:08,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2202010.0, ans=0.0 2024-08-13 16:02:09,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-13 16:02:15,168 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 16:02:16,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2202010.0, ans=0.1 2024-08-13 16:02:20,273 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 16:02:21,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2202110.0, ans=0.95 2024-08-13 16:02:32,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-13 16:02:50,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2850, loss[loss=0.1, beats_loss=0.01053, ecapa_loss=0.0001582, whisper_loss=0.08791, over 22281.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.000161, whisper_loss=0.0907, over 3863999.61 frames. ], batch size: 91, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:02:51,812 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 16:02:58,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2202310.0, ans=0.0 2024-08-13 16:03:06,235 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.950e-02 2024-08-13 16:03:08,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.307e+01 2.674e+01 3.004e+01 5.549e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 16:03:14,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2202410.0, ans=0.125 2024-08-13 16:03:15,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2202410.0, ans=0.125 2024-08-13 16:03:52,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2202610.0, ans=0.1 2024-08-13 16:03:58,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2202710.0, ans=0.0 2024-08-13 16:04:01,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-13 16:04:04,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2202710.0, ans=0.025 2024-08-13 16:04:10,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2900, loss[loss=0.1063, beats_loss=0.0119, ecapa_loss=0.0001539, whisper_loss=0.09284, over 21173.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001616, whisper_loss=0.09047, over 3859119.50 frames. ], batch size: 82, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:04:23,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2202810.0, ans=0.125 2024-08-13 16:04:24,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2024-08-13 16:04:33,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-13 16:04:35,366 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 16:04:48,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2203010.0, ans=0.125 2024-08-13 16:04:55,958 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 16:04:59,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2203110.0, ans=0.0 2024-08-13 16:05:08,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2203210.0, ans=0.0 2024-08-13 16:05:23,055 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 2950, loss[loss=0.1109, beats_loss=0.01213, ecapa_loss=0.0001591, whisper_loss=0.09714, over 22637.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001619, whisper_loss=0.09068, over 3865885.65 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:05:28,040 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 16:05:30,734 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 16:05:37,837 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 16:05:38,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.341e+01 2.613e+01 3.038e+01 5.265e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-13 16:05:43,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-13 16:05:46,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2203410.0, ans=0.1 2024-08-13 16:05:57,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2203510.0, ans=0.0 2024-08-13 16:06:08,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2203610.0, ans=0.125 2024-08-13 16:06:19,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2203710.0, ans=0.0 2024-08-13 16:06:19,099 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:06:21,995 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:06:26,872 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 16:06:32,155 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3000, loss[loss=0.1355, beats_loss=0.006946, ecapa_loss=0.000154, whisper_loss=0.127, over 22803.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001632, whisper_loss=0.09094, over 3870604.38 frames. ], batch size: 84, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:06:32,155 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 16:07:12,429 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2474, over 922467.00 frames. 2024-08-13 16:07:30,327 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.004334, beats_loss=0, ecapa_loss=0.0004334, whisper_loss=0, over 939242.00 frames. 2024-08-13 16:09:55,762 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02373, beats_loss=0.02373, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 16:09:55,766 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 16:10:16,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2024-08-13 16:10:29,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2203910.0, ans=0.025 2024-08-13 16:10:56,318 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 16:11:27,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3050, loss[loss=0.116, beats_loss=0.01031, ecapa_loss=0.0002, whisper_loss=0.1037, over 21803.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0108, ecapa_loss=0.0001632, whisper_loss=0.0914, over 3895721.29 frames. ], batch size: 91, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:11:42,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.448e+01 2.786e+01 3.089e+01 5.850e+01, threshold=5.572e+01, percent-clipped=2.0 2024-08-13 16:11:49,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2204410.0, ans=0.0 2024-08-13 16:11:51,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2204410.0, ans=0.0 2024-08-13 16:11:54,603 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 16:11:59,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2204510.0, ans=0.125 2024-08-13 16:12:00,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2204510.0, ans=0.2 2024-08-13 16:12:13,557 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 16:12:35,611 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3100, loss[loss=0.1067, beats_loss=0.01036, ecapa_loss=0.0001621, whisper_loss=0.09476, over 23191.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001631, whisper_loss=0.09164, over 3907520.47 frames. ], batch size: 92, lr: 3.96e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:12:43,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-13 16:13:13,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2205010.0, ans=0.05 2024-08-13 16:13:16,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2205110.0, ans=0.0 2024-08-13 16:13:16,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-13 16:13:28,863 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 16:13:30,113 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 16:13:34,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2205210.0, ans=0.1 2024-08-13 16:13:40,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2024-08-13 16:13:42,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2205210.0, ans=0.025 2024-08-13 16:13:45,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3150, loss[loss=0.1119, beats_loss=0.01182, ecapa_loss=0.0001769, whisper_loss=0.09828, over 23264.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001626, whisper_loss=0.09137, over 3877332.95 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:13:47,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2205310.0, ans=0.125 2024-08-13 16:14:00,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.370e+01 2.702e+01 3.002e+01 4.700e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-13 16:14:20,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2205510.0, ans=0.125 2024-08-13 16:14:53,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2205710.0, ans=0.0 2024-08-13 16:14:54,047 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 16:14:56,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3200, loss[loss=0.1353, beats_loss=0.006394, ecapa_loss=0.0001917, whisper_loss=0.127, over 18508.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01078, ecapa_loss=0.0001628, whisper_loss=0.09259, over 3847457.43 frames. ], batch size: 71, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:15:12,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2205910.0, ans=0.2 2024-08-13 16:15:18,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2205910.0, ans=0.125 2024-08-13 16:15:19,984 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 16:15:26,730 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 16:15:27,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-08-13 16:15:31,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2206010.0, ans=0.125 2024-08-13 16:15:31,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2206010.0, ans=0.0 2024-08-13 16:15:43,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2206110.0, ans=0.125 2024-08-13 16:15:57,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2206210.0, ans=0.0 2024-08-13 16:15:57,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2206210.0, ans=0.0 2024-08-13 16:15:58,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2206210.0, ans=0.125 2024-08-13 16:16:10,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.75 vs. limit=10.0 2024-08-13 16:16:10,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3250, loss[loss=0.08844, beats_loss=0.01061, ecapa_loss=0.000135, whisper_loss=0.07648, over 20879.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001636, whisper_loss=0.09194, over 3861482.08 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:16:17,752 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 16:16:25,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.428e+01 2.754e+01 3.023e+01 4.086e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-13 16:17:13,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.855e-02 2024-08-13 16:17:14,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-08-13 16:17:14,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2206710.0, ans=0.125 2024-08-13 16:17:15,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2206710.0, ans=0.1 2024-08-13 16:17:16,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2206710.0, ans=0.05 2024-08-13 16:17:17,952 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 16:17:18,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2206710.0, ans=0.125 2024-08-13 16:17:22,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3300, loss[loss=0.1062, beats_loss=0.01042, ecapa_loss=0.000187, whisper_loss=0.09394, over 19262.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001627, whisper_loss=0.09171, over 3873157.52 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:17:34,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2206810.0, ans=0.125 2024-08-13 16:17:41,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2206910.0, ans=0.125 2024-08-13 16:17:58,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-08-13 16:18:18,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2207210.0, ans=10.0 2024-08-13 16:18:19,823 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 16:18:22,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-13 16:18:23,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2207210.0, ans=0.125 2024-08-13 16:18:25,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-13 16:18:33,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3350, loss[loss=0.1074, beats_loss=0.00871, ecapa_loss=0.0001976, whisper_loss=0.09667, over 22297.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001623, whisper_loss=0.0917, over 3875220.41 frames. ], batch size: 93, lr: 3.95e-03, grad_scale: 5.764607523034235e+17 2024-08-13 16:18:49,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.421e+01 2.639e+01 2.919e+01 4.017e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:18:53,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-13 16:19:18,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2207610.0, ans=0.0 2024-08-13 16:19:20,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2207610.0, ans=0.1 2024-08-13 16:19:21,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2207610.0, ans=0.125 2024-08-13 16:19:24,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-13 16:19:30,689 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 16:19:39,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-08-13 16:19:44,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2207810.0, ans=0.125 2024-08-13 16:19:45,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3400, loss[loss=0.1141, beats_loss=0.009885, ecapa_loss=0.0001695, whisper_loss=0.1025, over 20772.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.0001617, whisper_loss=0.09242, over 3873208.88 frames. ], batch size: 84, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:20:31,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-13 16:20:36,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2208110.0, ans=0.125 2024-08-13 16:20:37,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2208110.0, ans=0.0 2024-08-13 16:20:40,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-13 16:20:47,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2208210.0, ans=0.09899494936611666 2024-08-13 16:20:53,693 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 16:20:56,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3450, loss[loss=0.1046, beats_loss=0.01173, ecapa_loss=0.0001538, whisper_loss=0.09136, over 21157.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.0001622, whisper_loss=0.09223, over 3882523.10 frames. ], batch size: 86, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:21:09,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2208410.0, ans=0.125 2024-08-13 16:21:11,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.477e+01 2.807e+01 3.322e+01 1.527e+02, threshold=5.614e+01, percent-clipped=5.0 2024-08-13 16:21:12,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2208410.0, ans=0.0 2024-08-13 16:21:17,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=22.5 2024-08-13 16:21:30,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2208510.0, ans=0.0 2024-08-13 16:21:38,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 16:21:48,532 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 30 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 16:22:06,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3500, loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001627, whisper_loss=0.09163, over 22617.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001626, whisper_loss=0.09203, over 3889464.55 frames. ], batch size: 92, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:22:14,636 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 16:22:16,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2208810.0, ans=0.2 2024-08-13 16:22:20,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-13 16:23:01,155 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 11 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 16:23:02,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2209110.0, ans=0.2 2024-08-13 16:23:07,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-13 16:23:16,576 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 16:23:18,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2209210.0, ans=0.1 2024-08-13 16:23:20,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3550, loss[loss=0.115, beats_loss=0.01134, ecapa_loss=0.000133, whisper_loss=0.1023, over 23756.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001612, whisper_loss=0.0919, over 3896168.66 frames. ], batch size: 90, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:23:24,759 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 16:23:36,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.460e+01 2.758e+01 3.003e+01 5.341e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-13 16:23:42,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2209410.0, ans=0.1 2024-08-13 16:23:50,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2209510.0, ans=0.125 2024-08-13 16:23:58,206 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 16:24:06,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2209610.0, ans=0.0 2024-08-13 16:24:08,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2209610.0, ans=0.1 2024-08-13 16:24:22,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2209710.0, ans=0.0 2024-08-13 16:24:26,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2209710.0, ans=0.125 2024-08-13 16:24:33,595 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 16:24:34,787 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3600, loss[loss=0.1055, beats_loss=0.0121, ecapa_loss=0.0001569, whisper_loss=0.09178, over 20768.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0108, ecapa_loss=0.0001611, whisper_loss=0.09243, over 3898983.09 frames. ], batch size: 83, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:24:34,973 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 16:24:35,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-13 16:25:02,589 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 16:25:30,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-13 16:25:31,164 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 16:25:46,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3650, loss[loss=0.1355, beats_loss=0.009091, ecapa_loss=0.0001223, whisper_loss=0.1252, over 24014.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001614, whisper_loss=0.09187, over 3876284.92 frames. ], batch size: 86, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:25:52,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2210310.0, ans=0.0 2024-08-13 16:26:02,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.319e+01 2.686e+01 3.119e+01 4.845e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-13 16:26:07,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=15.0 2024-08-13 16:26:15,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-13 16:26:28,950 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 16:26:44,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2210710.0, ans=0.125 2024-08-13 16:26:47,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2210710.0, ans=0.0 2024-08-13 16:26:56,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3700, loss[loss=0.0961, beats_loss=0.01305, ecapa_loss=0.0001446, whisper_loss=0.0816, over 22907.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01072, ecapa_loss=0.0001614, whisper_loss=0.09209, over 3864272.43 frames. ], batch size: 93, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:27:11,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-08-13 16:27:15,285 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:27:17,630 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.167e+01 2024-08-13 16:27:19,724 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 16:27:26,789 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 16:27:39,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2211110.0, ans=0.2 2024-08-13 16:27:44,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2211110.0, ans=0.2 2024-08-13 16:27:52,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2211210.0, ans=0.125 2024-08-13 16:28:03,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3750, loss[loss=0.1011, beats_loss=0.01148, ecapa_loss=0.0001382, whisper_loss=0.0882, over 17591.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.0001621, whisper_loss=0.09223, over 3876208.46 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:28:05,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2211310.0, ans=0.0 2024-08-13 16:28:09,813 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 16:28:17,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.410e+01 2.677e+01 3.009e+01 6.113e+01, threshold=5.354e+01, percent-clipped=1.0 2024-08-13 16:28:24,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2211410.0, ans=15.0 2024-08-13 16:28:26,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2211410.0, ans=0.0 2024-08-13 16:28:27,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2211410.0, ans=0.1 2024-08-13 16:28:34,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.78 vs. limit=22.5 2024-08-13 16:28:44,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2211610.0, ans=0.1 2024-08-13 16:28:50,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2211610.0, ans=0.1 2024-08-13 16:28:59,336 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 16:29:03,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-13 16:29:08,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3800, loss[loss=0.09372, beats_loss=0.01267, ecapa_loss=0.0001803, whisper_loss=0.07925, over 21114.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01075, ecapa_loss=0.0001628, whisper_loss=0.09188, over 3852934.32 frames. ], batch size: 87, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:29:12,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2211810.0, ans=0.0 2024-08-13 16:29:15,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2211810.0, ans=0.125 2024-08-13 16:29:15,960 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 16:29:18,471 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 16:29:22,947 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.200e+01 2024-08-13 16:29:38,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2212010.0, ans=0.0 2024-08-13 16:29:42,249 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:30:11,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2212210.0, ans=0.1 2024-08-13 16:30:13,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3850, loss[loss=0.1162, beats_loss=0.01337, ecapa_loss=0.0001314, whisper_loss=0.1015, over 17866.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001637, whisper_loss=0.09133, over 3840103.90 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:30:22,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2212310.0, ans=0.0 2024-08-13 16:30:27,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.456e+01 2.760e+01 3.181e+01 8.437e+01, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 16:30:28,812 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 16:30:39,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2212510.0, ans=0.0 2024-08-13 16:31:18,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3900, loss[loss=0.1281, beats_loss=0.008216, ecapa_loss=0.0001748, whisper_loss=0.1181, over 15228.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001637, whisper_loss=0.09111, over 3878904.17 frames. ], batch size: 59, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:31:22,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2212810.0, ans=0.125 2024-08-13 16:31:31,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2212910.0, ans=0.125 2024-08-13 16:31:33,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2212910.0, ans=0.0 2024-08-13 16:31:41,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2212910.0, ans=0.0 2024-08-13 16:31:46,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2213010.0, ans=0.025 2024-08-13 16:31:47,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2213010.0, ans=0.125 2024-08-13 16:31:52,531 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 16:31:56,677 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-13 16:31:57,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2213110.0, ans=0.5 2024-08-13 16:32:15,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2213210.0, ans=0.1 2024-08-13 16:32:23,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 3950, loss[loss=0.108, beats_loss=0.009946, ecapa_loss=0.0001965, whisper_loss=0.09612, over 18996.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001638, whisper_loss=0.09115, over 3874224.67 frames. ], batch size: 79, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:32:23,309 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 16:32:37,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.503e+01 2.824e+01 3.168e+01 4.630e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 16:32:56,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2213510.0, ans=0.1 2024-08-13 16:33:02,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-08-13 16:33:16,662 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 16:33:28,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4000, loss[loss=0.1125, beats_loss=0.01187, ecapa_loss=0.0001726, whisper_loss=0.0989, over 20640.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0108, ecapa_loss=0.0001644, whisper_loss=0.09165, over 3886363.20 frames. ], batch size: 83, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:33:30,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2213810.0, ans=0.2 2024-08-13 16:33:32,863 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 19 from Vox, 13 fro AS 2024-08-13 16:33:35,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2213810.0, ans=0.0 2024-08-13 16:33:48,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2213910.0, ans=0.125 2024-08-13 16:33:54,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2024-08-13 16:34:02,642 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 16:34:14,602 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 16:34:19,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2214210.0, ans=0.125 2024-08-13 16:34:32,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-13 16:34:33,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4050, loss[loss=0.1126, beats_loss=0.01059, ecapa_loss=0.0001334, whisper_loss=0.1007, over 15831.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.000165, whisper_loss=0.09192, over 3906853.16 frames. ], batch size: 62, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:34:48,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.510e+01 2.777e+01 3.045e+01 5.508e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-13 16:35:02,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2214510.0, ans=0.125 2024-08-13 16:35:14,439 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 16:35:27,434 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 16:35:38,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4100, loss[loss=0.116, beats_loss=0.0107, ecapa_loss=0.000208, whisper_loss=0.1032, over 20451.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01068, ecapa_loss=0.0001658, whisper_loss=0.09265, over 3896806.15 frames. ], batch size: 84, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:35:41,965 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:35:53,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2214910.0, ans=0.125 2024-08-13 16:35:55,798 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 16:35:56,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2214910.0, ans=0.2 2024-08-13 16:36:04,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2024-08-13 16:36:32,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2215210.0, ans=0.1 2024-08-13 16:36:43,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4150, loss[loss=0.08601, beats_loss=0.01289, ecapa_loss=0.0001613, whisper_loss=0.0715, over 21983.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01071, ecapa_loss=0.0001654, whisper_loss=0.09257, over 3901214.30 frames. ], batch size: 93, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:36:44,486 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:36:46,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2215310.0, ans=0.0 2024-08-13 16:36:50,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2215310.0, ans=0.1 2024-08-13 16:36:57,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.337e+01 2.557e+01 2.975e+01 8.257e+01, threshold=5.114e+01, percent-clipped=2.0 2024-08-13 16:37:27,077 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 30 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 16:37:27,348 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:37:33,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-13 16:37:45,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2215710.0, ans=0.125 2024-08-13 16:37:48,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4200, loss[loss=0.1273, beats_loss=0.007498, ecapa_loss=0.0002315, whisper_loss=0.1175, over 17236.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0107, ecapa_loss=0.0001655, whisper_loss=0.09212, over 3896682.45 frames. ], batch size: 70, lr: 3.95e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:38:08,464 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 16:38:09,753 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 16:38:28,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2216110.0, ans=0.125 2024-08-13 16:38:34,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2216110.0, ans=0.0 2024-08-13 16:38:35,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2216110.0, ans=0.2 2024-08-13 16:38:40,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2216210.0, ans=10.0 2024-08-13 16:38:49,034 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 16:38:50,799 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.112e-01 2024-08-13 16:38:56,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4250, loss[loss=0.08467, beats_loss=0.01417, ecapa_loss=8.833e-05, whisper_loss=0.06961, over 20687.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001638, whisper_loss=0.09176, over 3926762.95 frames. ], batch size: 81, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:39:10,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2216410.0, ans=0.125 2024-08-13 16:39:12,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.639e+01 2.854e+01 4.176e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-13 16:39:19,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2216410.0, ans=0.2 2024-08-13 16:39:31,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2216510.0, ans=0.2 2024-08-13 16:39:35,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2216510.0, ans=0.125 2024-08-13 16:39:38,003 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 16:40:03,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-13 16:40:06,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2216710.0, ans=0.2 2024-08-13 16:40:11,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4300, loss[loss=0.08555, beats_loss=0.0115, ecapa_loss=0.0001623, whisper_loss=0.07243, over 21167.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001642, whisper_loss=0.0913, over 3898642.80 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:40:12,074 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 16:40:56,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2217110.0, ans=0.125 2024-08-13 16:41:17,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2217210.0, ans=0.0 2024-08-13 16:41:27,565 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4350, loss[loss=0.1294, beats_loss=0.008215, ecapa_loss=0.0001621, whisper_loss=0.1196, over 22567.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001637, whisper_loss=0.09162, over 3880931.47 frames. ], batch size: 84, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:41:29,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2217310.0, ans=0.125 2024-08-13 16:41:30,338 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 16:41:38,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2217310.0, ans=0.125 2024-08-13 16:41:41,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.464e+01 2.794e+01 3.090e+01 4.694e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-13 16:41:57,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2217510.0, ans=0.125 2024-08-13 16:41:59,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2217510.0, ans=0.07 2024-08-13 16:42:02,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2217510.0, ans=0.125 2024-08-13 16:42:04,233 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 16:42:13,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2217610.0, ans=0.0 2024-08-13 16:42:22,380 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 16:42:32,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4400, loss[loss=0.09473, beats_loss=0.01195, ecapa_loss=0.0001843, whisper_loss=0.08094, over 16105.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001633, whisper_loss=0.09098, over 3835242.67 frames. ], batch size: 70, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:42:37,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2217810.0, ans=15.0 2024-08-13 16:42:50,124 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-13 16:43:01,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2218010.0, ans=0.0 2024-08-13 16:43:10,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2218110.0, ans=10.0 2024-08-13 16:43:26,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-08-13 16:43:37,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4450, loss[loss=0.1132, beats_loss=0.01102, ecapa_loss=0.0001605, whisper_loss=0.1005, over 22881.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01081, ecapa_loss=0.0001632, whisper_loss=0.09086, over 3824030.36 frames. ], batch size: 91, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:43:40,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2218310.0, ans=0.0 2024-08-13 16:43:44,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-13 16:43:51,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.377e+01 2.551e+01 3.070e+01 5.212e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-13 16:43:53,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2218410.0, ans=0.0 2024-08-13 16:43:58,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2218410.0, ans=0.125 2024-08-13 16:44:05,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2218510.0, ans=0.125 2024-08-13 16:44:11,348 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 16:44:26,789 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-13 16:44:38,509 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.381e-02 2024-08-13 16:44:41,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4500, loss[loss=0.1115, beats_loss=0.01153, ecapa_loss=0.0001656, whisper_loss=0.0983, over 23671.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001636, whisper_loss=0.09026, over 3841765.72 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:44:43,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2218810.0, ans=0.125 2024-08-13 16:44:52,298 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 16:45:07,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2219010.0, ans=10.0 2024-08-13 16:45:08,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2219010.0, ans=0.125 2024-08-13 16:45:12,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2219010.0, ans=0.0 2024-08-13 16:45:13,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2219010.0, ans=0.125 2024-08-13 16:45:28,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2219110.0, ans=0.0 2024-08-13 16:45:43,169 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 16:45:43,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-13 16:45:51,942 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 16:45:55,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4550, loss[loss=0.1114, beats_loss=0.006731, ecapa_loss=0.0002118, whisper_loss=0.1025, over 14704.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01089, ecapa_loss=0.0001634, whisper_loss=0.09068, over 3879529.66 frames. ], batch size: 56, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:46:12,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.447e+01 2.788e+01 3.187e+01 5.560e+01, threshold=5.575e+01, percent-clipped=2.0 2024-08-13 16:46:29,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2219510.0, ans=0.125 2024-08-13 16:46:41,328 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-13 16:47:31,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4600, loss[loss=0.1108, beats_loss=0.009507, ecapa_loss=0.000183, whisper_loss=0.09941, over 22084.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001632, whisper_loss=0.09047, over 3872859.36 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:48:10,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:48:16,243 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 16:48:22,051 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 16:49:11,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2024-08-13 16:49:24,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4650, loss[loss=0.1098, beats_loss=0.009898, ecapa_loss=0.0001851, whisper_loss=0.09808, over 22672.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001632, whisper_loss=0.09088, over 3886011.22 frames. ], batch size: 93, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:49:31,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-13 16:49:40,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-13 16:49:47,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2220410.0, ans=0.125 2024-08-13 16:49:49,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-13 16:49:50,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.519e+01 2.721e+01 2.978e+01 4.976e+01, threshold=5.443e+01, percent-clipped=0.0 2024-08-13 16:50:04,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2220410.0, ans=0.125 2024-08-13 16:50:07,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2220410.0, ans=0.125 2024-08-13 16:50:17,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2220510.0, ans=0.1 2024-08-13 16:50:43,885 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 16:50:53,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2220710.0, ans=0.0 2024-08-13 16:51:02,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2220710.0, ans=0.0 2024-08-13 16:51:02,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2220710.0, ans=0.2 2024-08-13 16:51:19,512 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4700, loss[loss=0.08832, beats_loss=0.01118, ecapa_loss=0.000202, whisper_loss=0.07512, over 15655.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001625, whisper_loss=0.09123, over 3890775.55 frames. ], batch size: 68, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:51:42,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2220910.0, ans=0.125 2024-08-13 16:51:49,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2220910.0, ans=0.125 2024-08-13 16:51:55,235 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 16:52:50,020 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-13 16:52:54,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2221210.0, ans=0.125 2024-08-13 16:53:01,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4750, loss[loss=0.07657, beats_loss=0.0154, ecapa_loss=0.0001583, whisper_loss=0.05959, over 20196.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.0001616, whisper_loss=0.09057, over 3897858.67 frames. ], batch size: 85, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:53:13,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2221310.0, ans=0.125 2024-08-13 16:53:17,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.389e+01 2.725e+01 3.065e+01 4.342e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 16:53:35,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-13 16:54:01,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2221710.0, ans=0.125 2024-08-13 16:54:04,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2221710.0, ans=0.125 2024-08-13 16:54:14,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4800, loss[loss=0.09689, beats_loss=0.01174, ecapa_loss=0.0001738, whisper_loss=0.08342, over 19485.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01097, ecapa_loss=0.0001632, whisper_loss=0.08993, over 3902574.29 frames. ], batch size: 81, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:54:15,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2221810.0, ans=0.2 2024-08-13 16:54:29,368 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.023e+01 2024-08-13 16:54:30,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2221910.0, ans=0.0 2024-08-13 16:54:35,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2221910.0, ans=0.1 2024-08-13 16:54:47,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2222010.0, ans=0.0 2024-08-13 16:55:14,580 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-13 16:55:35,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4850, loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001513, whisper_loss=0.09146, over 22410.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01095, ecapa_loss=0.0001633, whisper_loss=0.09066, over 3919682.21 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:55:38,684 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 16:55:52,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.475e+01 2.681e+01 3.157e+01 5.324e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 16:55:53,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2222410.0, ans=0.0 2024-08-13 16:55:53,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-13 16:56:01,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2024-08-13 16:56:12,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2222510.0, ans=0.1 2024-08-13 16:56:16,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2222510.0, ans=0.125 2024-08-13 16:56:17,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2222510.0, ans=0.1 2024-08-13 16:56:34,054 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 16:56:34,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2222610.0, ans=0.1 2024-08-13 16:56:45,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-13 16:56:51,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4900, loss[loss=0.09171, beats_loss=0.01213, ecapa_loss=0.0001768, whisper_loss=0.07781, over 14686.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01093, ecapa_loss=0.0001636, whisper_loss=0.09061, over 3887134.32 frames. ], batch size: 59, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:57:11,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2222910.0, ans=0.125 2024-08-13 16:57:21,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2223010.0, ans=10.0 2024-08-13 16:57:28,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2223010.0, ans=0.125 2024-08-13 16:57:31,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-13 16:57:37,714 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 16:57:51,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-08-13 16:58:08,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 4950, loss[loss=0.1042, beats_loss=0.009956, ecapa_loss=0.0001694, whisper_loss=0.09255, over 16106.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001625, whisper_loss=0.09064, over 3890592.01 frames. ], batch size: 65, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:58:15,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2223310.0, ans=0.125 2024-08-13 16:58:20,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2223310.0, ans=0.025 2024-08-13 16:58:26,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.318e+01 2.496e+01 2.819e+01 1.833e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-13 16:58:37,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-13 16:59:04,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2223610.0, ans=0.125 2024-08-13 16:59:26,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5000, loss[loss=0.1094, beats_loss=0.01108, ecapa_loss=0.000182, whisper_loss=0.09651, over 21797.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.000162, whisper_loss=0.09024, over 3883528.84 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 16:59:34,122 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 16:59:41,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2223910.0, ans=0.125 2024-08-13 16:59:43,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-13 17:00:03,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2224010.0, ans=0.125 2024-08-13 17:00:04,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2224010.0, ans=0.0 2024-08-13 17:00:17,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2024-08-13 17:00:32,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2224210.0, ans=0.125 2024-08-13 17:00:41,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5050, loss[loss=0.08984, beats_loss=0.01516, ecapa_loss=0.0001386, whisper_loss=0.0733, over 20903.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001617, whisper_loss=0.09082, over 3892101.85 frames. ], batch size: 84, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:00:57,685 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 17:01:00,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.341e+01 2.653e+01 3.152e+01 4.271e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-13 17:01:01,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-13 17:01:14,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2224510.0, ans=0.0 2024-08-13 17:01:22,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-08-13 17:01:27,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2224610.0, ans=0.1 2024-08-13 17:01:28,896 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 17:01:30,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2224610.0, ans=0.125 2024-08-13 17:01:30,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2224610.0, ans=0.1 2024-08-13 17:01:34,217 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 17:01:48,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2224710.0, ans=0.0 2024-08-13 17:01:49,392 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 17:01:57,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5100, loss[loss=0.09052, beats_loss=0.01086, ecapa_loss=0.000137, whisper_loss=0.07829, over 20310.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01103, ecapa_loss=0.0001612, whisper_loss=0.09091, over 3899680.04 frames. ], batch size: 78, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:02:12,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-13 17:02:12,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-13 17:02:19,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2224910.0, ans=0.0 2024-08-13 17:02:52,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2225110.0, ans=0.125 2024-08-13 17:03:02,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2225210.0, ans=0.0 2024-08-13 17:03:04,955 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 17:03:09,648 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-13 17:03:14,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5150, loss[loss=0.1116, beats_loss=0.008631, ecapa_loss=0.0001661, whisper_loss=0.1013, over 17451.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001605, whisper_loss=0.09174, over 3912038.66 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:03:16,896 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 17:03:20,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2225310.0, ans=0.125 2024-08-13 17:03:29,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.384e+01 2.654e+01 2.972e+01 6.587e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-13 17:03:33,378 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 17:03:38,706 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 17:03:41,496 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 17:03:48,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2225510.0, ans=0.1 2024-08-13 17:04:03,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2225610.0, ans=0.2 2024-08-13 17:04:18,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2225710.0, ans=0.125 2024-08-13 17:04:23,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2225710.0, ans=0.125 2024-08-13 17:04:28,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5200, loss[loss=0.1108, beats_loss=0.007901, ecapa_loss=0.0001797, whisper_loss=0.1011, over 17393.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.00016, whisper_loss=0.09179, over 3886579.95 frames. ], batch size: 69, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:04:33,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2225810.0, ans=0.2 2024-08-13 17:05:02,939 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 17:05:16,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-13 17:05:41,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5250, loss[loss=0.1225, beats_loss=0.008842, ecapa_loss=0.0001816, whisper_loss=0.1118, over 15629.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001605, whisper_loss=0.09142, over 3855572.97 frames. ], batch size: 59, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:05:58,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.417e+01 2.576e+01 2.914e+01 4.655e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 17:06:09,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2226410.0, ans=0.0 2024-08-13 17:06:11,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2226510.0, ans=0.125 2024-08-13 17:06:21,728 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 17:06:25,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2226510.0, ans=0.2 2024-08-13 17:06:34,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2226610.0, ans=0.0 2024-08-13 17:06:37,186 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 17:06:45,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.37 vs. limit=22.5 2024-08-13 17:06:54,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2226710.0, ans=0.125 2024-08-13 17:06:58,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2226810.0, ans=0.125 2024-08-13 17:06:59,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5300, loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001379, whisper_loss=0.09104, over 22944.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001612, whisper_loss=0.09185, over 3860224.98 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:07:01,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2226810.0, ans=15.0 2024-08-13 17:07:06,662 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 32 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 17:07:55,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.55 vs. limit=22.5 2024-08-13 17:08:16,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5350, loss[loss=0.08685, beats_loss=0.0145, ecapa_loss=0.0001406, whisper_loss=0.07094, over 23189.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01082, ecapa_loss=0.00016, whisper_loss=0.09202, over 3892053.56 frames. ], batch size: 95, lr: 3.94e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:08:20,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2227310.0, ans=0.125 2024-08-13 17:08:25,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2227310.0, ans=0.1 2024-08-13 17:08:34,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.323e+01 2.552e+01 2.858e+01 4.460e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-13 17:08:34,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2227410.0, ans=0.125 2024-08-13 17:08:43,050 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 17:08:44,841 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 32 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-13 17:09:00,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2227510.0, ans=0.125 2024-08-13 17:09:20,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2227710.0, ans=0.0 2024-08-13 17:09:35,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5400, loss[loss=0.128, beats_loss=0.009976, ecapa_loss=0.0001777, whisper_loss=0.1162, over 24178.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001611, whisper_loss=0.09246, over 3855640.27 frames. ], batch size: 96, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:09:50,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2227910.0, ans=0.125 2024-08-13 17:09:52,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2227910.0, ans=0.125 2024-08-13 17:09:56,046 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 17:10:03,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2024-08-13 17:10:18,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-13 17:10:18,995 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 17:10:22,263 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 17:10:33,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-13 17:10:35,023 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 17:10:35,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2228110.0, ans=0.05 2024-08-13 17:10:47,325 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 17:10:50,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2228210.0, ans=0.0 2024-08-13 17:10:53,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5450, loss[loss=0.0858, beats_loss=0.01264, ecapa_loss=0.0001043, whisper_loss=0.07212, over 17797.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001615, whisper_loss=0.09188, over 3858583.27 frames. ], batch size: 66, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:11:11,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.403e+01 2.600e+01 2.908e+01 1.736e+02, threshold=5.201e+01, percent-clipped=2.0 2024-08-13 17:11:12,176 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 17:11:25,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2228510.0, ans=10.0 2024-08-13 17:11:26,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-08-13 17:11:44,156 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 17:11:46,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2228610.0, ans=0.0 2024-08-13 17:11:47,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2228610.0, ans=0.1 2024-08-13 17:11:59,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2228710.0, ans=0.125 2024-08-13 17:12:12,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5500, loss[loss=0.09961, beats_loss=0.01253, ecapa_loss=0.0001908, whisper_loss=0.08517, over 15995.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01074, ecapa_loss=0.0001612, whisper_loss=0.09184, over 3864959.77 frames. ], batch size: 67, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:12:41,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2229010.0, ans=0.0 2024-08-13 17:12:43,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2229010.0, ans=0.2 2024-08-13 17:12:58,139 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 17:13:05,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-08-13 17:13:13,286 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 17:13:19,582 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 17:13:29,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2229310.0, ans=0.125 2024-08-13 17:13:30,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5550, loss[loss=0.07973, beats_loss=0.01303, ecapa_loss=0.0001714, whisper_loss=0.06499, over 14475.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01071, ecapa_loss=0.0001608, whisper_loss=0.09202, over 3886759.59 frames. ], batch size: 59, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:13:34,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2229310.0, ans=0.125 2024-08-13 17:13:37,438 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-13 17:13:43,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2229310.0, ans=0.125 2024-08-13 17:13:51,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.399e+01 2.709e+01 2.923e+01 5.241e+01, threshold=5.419e+01, percent-clipped=1.0 2024-08-13 17:13:53,916 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 17:14:09,688 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 17:14:30,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-13 17:14:42,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2229710.0, ans=0.2 2024-08-13 17:14:43,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2229710.0, ans=0.125 2024-08-13 17:14:52,988 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 17:14:53,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2229810.0, ans=0.0 2024-08-13 17:14:54,205 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5600, loss[loss=0.1116, beats_loss=0.01177, ecapa_loss=0.0001449, whisper_loss=0.09843, over 23581.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001618, whisper_loss=0.09182, over 3909168.92 frames. ], batch size: 94, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:14:54,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2229810.0, ans=0.2 2024-08-13 17:14:58,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2229810.0, ans=0.125 2024-08-13 17:15:18,602 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 17:15:18,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-13 17:15:21,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2229910.0, ans=0.125 2024-08-13 17:15:26,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2230010.0, ans=0.0 2024-08-13 17:15:43,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2230110.0, ans=0.125 2024-08-13 17:15:45,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2230110.0, ans=0.125 2024-08-13 17:15:47,023 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-13 17:15:53,427 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 17:16:03,085 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.362e-03 2024-08-13 17:16:11,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5650, loss[loss=0.09822, beats_loss=0.01227, ecapa_loss=0.0001683, whisper_loss=0.08427, over 21002.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01086, ecapa_loss=0.0001613, whisper_loss=0.09087, over 3923335.37 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:16:24,111 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 17:16:27,188 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 17:16:29,601 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.435e+01 2.742e+01 3.035e+01 1.015e+02, threshold=5.483e+01, percent-clipped=1.0 2024-08-13 17:16:41,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2230510.0, ans=0.125 2024-08-13 17:16:46,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.36 vs. limit=22.5 2024-08-13 17:16:58,523 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 17:17:09,214 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 17:17:12,324 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 17:17:29,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5700, loss[loss=0.09824, beats_loss=0.01116, ecapa_loss=0.0001736, whisper_loss=0.08535, over 21566.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001616, whisper_loss=0.09171, over 3938293.61 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:17:32,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2230810.0, ans=0.125 2024-08-13 17:17:32,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=12.0 2024-08-13 17:17:34,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2230810.0, ans=0.125 2024-08-13 17:17:51,670 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 17:18:32,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2231210.0, ans=0.2 2024-08-13 17:18:35,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2231210.0, ans=0.125 2024-08-13 17:18:37,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-08-13 17:18:48,329 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5750, loss[loss=0.08521, beats_loss=0.01121, ecapa_loss=0.0001183, whisper_loss=0.07281, over 20093.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001628, whisper_loss=0.09229, over 3932542.80 frames. ], batch size: 75, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:19:07,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.349e+01 2.635e+01 2.966e+01 1.104e+02, threshold=5.269e+01, percent-clipped=1.0 2024-08-13 17:19:09,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2231410.0, ans=0.0 2024-08-13 17:19:17,066 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 17:19:20,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2231510.0, ans=0.0 2024-08-13 17:19:27,062 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 17:19:46,489 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 17:19:57,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2231710.0, ans=0.1 2024-08-13 17:20:05,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5800, loss[loss=0.1041, beats_loss=0.01226, ecapa_loss=0.0001663, whisper_loss=0.09013, over 21848.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001629, whisper_loss=0.09217, over 3916243.64 frames. ], batch size: 89, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:20:09,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2231810.0, ans=0.125 2024-08-13 17:20:12,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2231810.0, ans=0.2 2024-08-13 17:20:26,118 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 17:20:33,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=16.30 vs. limit=15.0 2024-08-13 17:20:40,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2232010.0, ans=0.125 2024-08-13 17:20:43,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2232010.0, ans=0.2 2024-08-13 17:20:47,831 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 17:20:53,947 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 17:21:03,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2232110.0, ans=0.0 2024-08-13 17:21:10,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2232210.0, ans=10.0 2024-08-13 17:21:20,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5850, loss[loss=0.1081, beats_loss=0.009908, ecapa_loss=0.000157, whisper_loss=0.0966, over 23160.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001624, whisper_loss=0.09143, over 3931840.24 frames. ], batch size: 89, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:21:32,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-13 17:21:37,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.300e+01 2.602e+01 2.847e+01 5.570e+01, threshold=5.204e+01, percent-clipped=1.0 2024-08-13 17:21:38,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2232410.0, ans=0.95 2024-08-13 17:21:43,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2232410.0, ans=0.1 2024-08-13 17:21:53,910 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:22:01,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2232510.0, ans=0.125 2024-08-13 17:22:23,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-13 17:22:24,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2232710.0, ans=0.0 2024-08-13 17:22:33,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5900, loss[loss=0.09937, beats_loss=0.01286, ecapa_loss=0.0001102, whisper_loss=0.0854, over 23899.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001613, whisper_loss=0.09105, over 3929762.08 frames. ], batch size: 91, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:23:02,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.96 vs. limit=10.0 2024-08-13 17:23:03,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2233010.0, ans=0.0 2024-08-13 17:23:17,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2233110.0, ans=0.04949747468305833 2024-08-13 17:23:25,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-13 17:23:36,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2233210.0, ans=0.0 2024-08-13 17:23:43,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2233310.0, ans=0.125 2024-08-13 17:23:44,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 5950, loss[loss=0.1215, beats_loss=0.008464, ecapa_loss=0.0001442, whisper_loss=0.1116, over 18796.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01091, ecapa_loss=0.0001609, whisper_loss=0.09091, over 3921651.74 frames. ], batch size: 69, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:23:46,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2233310.0, ans=0.125 2024-08-13 17:23:55,628 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 17:23:57,080 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 17:24:00,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2233410.0, ans=0.125 2024-08-13 17:24:01,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.407e+01 2.699e+01 3.092e+01 2.272e+02, threshold=5.398e+01, percent-clipped=4.0 2024-08-13 17:24:10,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2233410.0, ans=0.125 2024-08-13 17:24:19,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-08-13 17:24:41,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2233710.0, ans=0.0 2024-08-13 17:24:50,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-08-13 17:24:54,038 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:24:57,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2233810.0, ans=0.0 2024-08-13 17:24:57,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6000, loss[loss=0.1181, beats_loss=0.01058, ecapa_loss=0.0001471, whisper_loss=0.106, over 16188.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001615, whisper_loss=0.0912, over 3912845.79 frames. ], batch size: 62, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:24:57,906 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 17:25:32,954 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005624, whisper_loss=0.2475, over 922467.00 frames. 2024-08-13 17:25:51,929 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.004549, beats_loss=0, ecapa_loss=0.0004549, whisper_loss=0, over 939242.00 frames. 2024-08-13 17:26:33,745 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5659, 3.8501, 4.3321, 4.4643], device='cuda:3') 2024-08-13 17:27:33,675 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02369, beats_loss=0.02369, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 17:27:33,679 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 17:27:35,147 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 17:27:35,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2233810.0, ans=0.125 2024-08-13 17:27:44,664 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 17:27:45,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-13 17:27:47,847 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 17:28:00,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2233910.0, ans=0.125 2024-08-13 17:28:20,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2234110.0, ans=0.0 2024-08-13 17:28:23,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2234110.0, ans=0.1 2024-08-13 17:28:38,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-13 17:28:39,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-13 17:28:47,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6050, loss[loss=0.1133, beats_loss=0.01162, ecapa_loss=0.0001325, whisper_loss=0.1004, over 21803.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09173, over 3894728.47 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:29:03,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2234410.0, ans=6.0 2024-08-13 17:29:06,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.365e+01 2.582e+01 2.840e+01 3.927e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-13 17:29:14,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2234410.0, ans=0.125 2024-08-13 17:29:16,589 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 17:29:20,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2234510.0, ans=0.125 2024-08-13 17:29:37,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2234610.0, ans=0.2 2024-08-13 17:29:42,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-13 17:29:43,258 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 17:29:43,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2234610.0, ans=0.04949747468305833 2024-08-13 17:30:03,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6100, loss[loss=0.09664, beats_loss=0.01259, ecapa_loss=0.0001447, whisper_loss=0.08261, over 21285.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001624, whisper_loss=0.09203, over 3887364.02 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:30:04,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2024-08-13 17:30:07,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2234810.0, ans=0.04949747468305833 2024-08-13 17:30:22,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2234910.0, ans=0.125 2024-08-13 17:30:29,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-13 17:30:33,085 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 17:30:42,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2235010.0, ans=0.1 2024-08-13 17:30:52,758 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 17:31:15,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6150, loss[loss=0.09588, beats_loss=0.01154, ecapa_loss=0.000188, whisper_loss=0.08246, over 21321.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.000163, whisper_loss=0.09149, over 3885700.90 frames. ], batch size: 89, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:31:23,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2235310.0, ans=0.2 2024-08-13 17:31:23,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2235310.0, ans=0.125 2024-08-13 17:31:33,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.323e+01 2.638e+01 2.979e+01 5.632e+01, threshold=5.276e+01, percent-clipped=1.0 2024-08-13 17:31:55,146 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 42 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 17:32:03,746 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 17:32:15,806 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 17:32:17,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2235710.0, ans=0.125 2024-08-13 17:32:29,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6200, loss[loss=0.13, beats_loss=0.007814, ecapa_loss=0.0001841, whisper_loss=0.1204, over 23237.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.0001628, whisper_loss=0.09231, over 3912694.10 frames. ], batch size: 88, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:32:39,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2235810.0, ans=0.1 2024-08-13 17:32:42,581 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 17:32:52,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2235910.0, ans=0.125 2024-08-13 17:32:57,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=12.0 2024-08-13 17:32:59,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2235910.0, ans=0.125 2024-08-13 17:33:09,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-08-13 17:33:13,387 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 17:33:15,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=22.09 vs. limit=22.5 2024-08-13 17:33:19,669 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 17:33:25,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2236110.0, ans=0.1 2024-08-13 17:33:36,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2236210.0, ans=0.125 2024-08-13 17:33:48,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6250, loss[loss=0.09169, beats_loss=0.01214, ecapa_loss=0.0001205, whisper_loss=0.07834, over 20410.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001623, whisper_loss=0.09202, over 3930004.67 frames. ], batch size: 81, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:34:01,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2024-08-13 17:34:03,429 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 17:34:05,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2236410.0, ans=0.125 2024-08-13 17:34:05,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.429e+01 2.660e+01 2.868e+01 5.842e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 17:34:20,672 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 15 from LS+wenet, 27 from Vox, 48 fro AS 2024-08-13 17:34:39,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2236610.0, ans=0.125 2024-08-13 17:34:41,483 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 17:34:43,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2236610.0, ans=0.125 2024-08-13 17:34:59,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2236710.0, ans=0.125 2024-08-13 17:35:04,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6300, loss[loss=0.06866, beats_loss=0.0137, ecapa_loss=0.0002026, whisper_loss=0.05293, over 20444.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001631, whisper_loss=0.09172, over 3914719.20 frames. ], batch size: 93, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:35:17,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-13 17:35:21,145 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 17:35:22,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-13 17:35:22,918 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 17:35:43,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2237010.0, ans=0.0 2024-08-13 17:36:20,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-08-13 17:36:22,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6350, loss[loss=0.1013, beats_loss=0.007269, ecapa_loss=0.0002035, whisper_loss=0.092, over 13762.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.000165, whisper_loss=0.09117, over 3892159.30 frames. ], batch size: 54, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:36:28,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2237310.0, ans=0.125 2024-08-13 17:36:33,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2237310.0, ans=0.0 2024-08-13 17:36:33,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2237310.0, ans=0.125 2024-08-13 17:36:40,809 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 17:36:42,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.465e+01 2.710e+01 3.055e+01 1.101e+02, threshold=5.419e+01, percent-clipped=2.0 2024-08-13 17:36:54,961 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 17:37:11,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2237510.0, ans=0.09899494936611666 2024-08-13 17:37:15,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.39 vs. limit=22.5 2024-08-13 17:37:45,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6400, loss[loss=0.1122, beats_loss=0.0112, ecapa_loss=0.0001313, whisper_loss=0.09968, over 19276.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001647, whisper_loss=0.09106, over 3895751.10 frames. ], batch size: 73, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:37:45,941 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 17:37:57,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-08-13 17:38:01,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-08-13 17:38:10,073 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 33 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 17:38:11,804 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 17:38:14,641 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 17:38:35,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-13 17:39:03,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6450, loss[loss=0.1126, beats_loss=0.01148, ecapa_loss=0.0001523, whisper_loss=0.09962, over 23169.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001638, whisper_loss=0.09119, over 3929537.09 frames. ], batch size: 95, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:39:19,354 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 17:39:22,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.439e+01 2.707e+01 3.110e+01 4.905e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-13 17:39:53,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-08-13 17:40:00,008 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.690e+05 2024-08-13 17:40:00,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-13 17:40:01,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2238610.0, ans=0.1 2024-08-13 17:40:05,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2238710.0, ans=0.125 2024-08-13 17:40:08,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2238710.0, ans=0.1 2024-08-13 17:40:22,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6500, loss[loss=0.09263, beats_loss=0.01282, ecapa_loss=0.0001701, whisper_loss=0.0781, over 18588.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001638, whisper_loss=0.09157, over 3895603.53 frames. ], batch size: 79, lr: 3.93e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:40:35,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2238910.0, ans=0.1 2024-08-13 17:40:40,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2238910.0, ans=0.125 2024-08-13 17:40:41,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2238910.0, ans=0.015 2024-08-13 17:40:49,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-13 17:40:53,087 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 17:40:54,715 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-13 17:40:57,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2239010.0, ans=0.05 2024-08-13 17:41:03,187 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 18 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-13 17:41:07,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-13 17:41:14,938 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 17:41:21,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2239110.0, ans=0.0 2024-08-13 17:41:23,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2239210.0, ans=0.125 2024-08-13 17:41:33,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2239210.0, ans=0.125 2024-08-13 17:41:39,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6550, loss[loss=0.0879, beats_loss=0.009663, ecapa_loss=0.0001974, whisper_loss=0.07627, over 20565.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.000164, whisper_loss=0.09158, over 3924043.09 frames. ], batch size: 88, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:41:46,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2239310.0, ans=0.05 2024-08-13 17:41:52,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2239310.0, ans=0.1 2024-08-13 17:41:53,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2239410.0, ans=0.0 2024-08-13 17:41:57,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.430e+01 2.695e+01 2.938e+01 3.674e+01, threshold=5.390e+01, percent-clipped=0.0 2024-08-13 17:42:17,042 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 17:42:41,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2239710.0, ans=0.125 2024-08-13 17:42:44,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=12.0 2024-08-13 17:42:52,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2239710.0, ans=0.0 2024-08-13 17:42:58,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6600, loss[loss=0.1155, beats_loss=0.009134, ecapa_loss=0.0001466, whisper_loss=0.1049, over 20814.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01078, ecapa_loss=0.0001632, whisper_loss=0.09261, over 3948733.57 frames. ], batch size: 79, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:43:01,110 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-13 17:43:11,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2239810.0, ans=0.0 2024-08-13 17:43:34,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2240010.0, ans=0.0 2024-08-13 17:44:11,175 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 17:44:18,397 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-13 17:44:18,747 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:44:26,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6650, loss[loss=0.09433, beats_loss=0.01093, ecapa_loss=0.0001797, whisper_loss=0.0816, over 19520.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01079, ecapa_loss=0.000164, whisper_loss=0.09249, over 3949790.00 frames. ], batch size: 79, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:44:46,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.433e+01 2.609e+01 2.879e+01 3.999e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-13 17:44:51,298 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 17:45:09,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-13 17:45:18,076 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 17:45:28,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2240610.0, ans=0.2 2024-08-13 17:45:39,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2240710.0, ans=0.0 2024-08-13 17:45:47,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2240710.0, ans=0.125 2024-08-13 17:45:50,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6700, loss[loss=0.06339, beats_loss=0.01356, ecapa_loss=0.0001393, whisper_loss=0.04843, over 16115.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0001632, whisper_loss=0.09288, over 3915436.20 frames. ], batch size: 66, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:46:51,678 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 17:46:57,202 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 17:47:15,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6750, loss[loss=0.0978, beats_loss=0.01157, ecapa_loss=0.0001357, whisper_loss=0.08488, over 16327.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01075, ecapa_loss=0.0001638, whisper_loss=0.09348, over 3915988.82 frames. ], batch size: 65, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:47:20,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2241310.0, ans=0.125 2024-08-13 17:47:22,621 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-13 17:47:28,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2241310.0, ans=0.125 2024-08-13 17:47:30,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2241310.0, ans=0.09899494936611666 2024-08-13 17:47:37,463 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.480e+01 2.825e+01 3.141e+01 1.321e+02, threshold=5.651e+01, percent-clipped=2.0 2024-08-13 17:47:42,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2241410.0, ans=0.2 2024-08-13 17:47:49,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-08-13 17:48:08,458 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 17:48:17,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2241610.0, ans=0.125 2024-08-13 17:48:30,503 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 17:48:38,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6800, loss[loss=0.1118, beats_loss=0.007801, ecapa_loss=0.0001777, whisper_loss=0.1022, over 18719.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001644, whisper_loss=0.09228, over 3915498.10 frames. ], batch size: 75, lr: 3.92e-03, grad_scale: 1.152921504606847e+18 2024-08-13 17:48:53,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2241810.0, ans=0.125 2024-08-13 17:48:59,233 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:49:18,502 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 17:49:18,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2242010.0, ans=0.0 2024-08-13 17:49:48,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2242210.0, ans=0.125 2024-08-13 17:49:50,045 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 17:49:54,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2242210.0, ans=10.0 2024-08-13 17:49:55,944 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 17:50:00,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6850, loss[loss=0.0966, beats_loss=0.01127, ecapa_loss=0.0001221, whisper_loss=0.08411, over 17026.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001641, whisper_loss=0.09204, over 3914690.39 frames. ], batch size: 67, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:50:06,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2242310.0, ans=0.0 2024-08-13 17:50:08,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2242310.0, ans=0.1 2024-08-13 17:50:15,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2242410.0, ans=0.0 2024-08-13 17:50:20,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.361e+01 2.636e+01 2.867e+01 1.284e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 17:50:25,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2242410.0, ans=0.0 2024-08-13 17:50:28,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2242410.0, ans=0.1 2024-08-13 17:50:43,032 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 17:50:57,278 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 17:51:05,399 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-13 17:51:12,100 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 17:51:15,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2242710.0, ans=0.125 2024-08-13 17:51:20,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2242810.0, ans=0.125 2024-08-13 17:51:20,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6900, loss[loss=0.0875, beats_loss=0.01199, ecapa_loss=0.000142, whisper_loss=0.07409, over 19277.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001644, whisper_loss=0.09206, over 3913915.72 frames. ], batch size: 79, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:51:21,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2242810.0, ans=0.125 2024-08-13 17:51:37,827 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 17:51:42,531 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 17:52:03,100 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 17:52:05,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2024-08-13 17:52:08,333 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:52:42,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 6950, loss[loss=0.1157, beats_loss=0.01057, ecapa_loss=0.0001602, whisper_loss=0.1035, over 16198.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001648, whisper_loss=0.09249, over 3912510.94 frames. ], batch size: 63, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:52:45,136 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 17:52:53,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-13 17:53:02,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.338e+01 2.546e+01 2.937e+01 5.530e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-13 17:53:05,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2243410.0, ans=0.0 2024-08-13 17:53:05,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-13 17:53:07,934 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 17:53:27,878 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 17:53:39,357 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 17:54:03,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7000, loss[loss=0.08309, beats_loss=0.008843, ecapa_loss=0.0001498, whisper_loss=0.07275, over 15021.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.000164, whisper_loss=0.09169, over 3893560.21 frames. ], batch size: 57, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:54:09,245 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 17:54:19,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-08-13 17:54:24,541 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 17:54:37,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2244010.0, ans=0.1 2024-08-13 17:54:52,449 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 17:55:01,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2244110.0, ans=0.125 2024-08-13 17:55:14,129 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 17:55:16,013 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 17:55:26,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7050, loss[loss=0.1197, beats_loss=0.00787, ecapa_loss=0.0001609, whisper_loss=0.1103, over 18597.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001636, whisper_loss=0.09151, over 3927965.32 frames. ], batch size: 72, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:55:31,782 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 17:55:44,982 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 17:55:45,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-13 17:55:48,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.461e+01 2.675e+01 2.991e+01 1.291e+02, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 17:55:53,091 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 17:55:55,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2244410.0, ans=0.0 2024-08-13 17:56:00,830 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 17:56:05,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2244510.0, ans=0.0 2024-08-13 17:56:11,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2244510.0, ans=0.1 2024-08-13 17:56:20,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2244610.0, ans=0.2 2024-08-13 17:56:23,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 17:56:43,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2244710.0, ans=0.2 2024-08-13 17:56:47,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7100, loss[loss=0.133, beats_loss=0.008991, ecapa_loss=0.0001635, whisper_loss=0.1224, over 22676.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001631, whisper_loss=0.09202, over 3897092.02 frames. ], batch size: 84, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:56:49,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-13 17:56:51,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2244810.0, ans=0.025 2024-08-13 17:57:00,298 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 17:57:04,396 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 17:57:07,764 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 17:57:19,231 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 17:57:31,840 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 32 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 17:57:37,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2245110.0, ans=0.125 2024-08-13 17:57:56,513 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 17:58:04,594 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 17:58:05,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-13 17:58:08,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7150, loss[loss=0.1096, beats_loss=0.01091, ecapa_loss=0.0001692, whisper_loss=0.097, over 23207.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001623, whisper_loss=0.09189, over 3932309.00 frames. ], batch size: 92, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:58:12,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2245310.0, ans=0.125 2024-08-13 17:58:31,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.399e+01 2.676e+01 3.068e+01 5.307e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 17:58:35,712 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 17:58:38,919 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 17:59:01,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-13 17:59:08,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2245610.0, ans=0.0 2024-08-13 17:59:21,216 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 17:59:22,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2245710.0, ans=0.125 2024-08-13 17:59:33,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7200, loss[loss=0.07153, beats_loss=0.01149, ecapa_loss=0.0002136, whisper_loss=0.05791, over 13784.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001627, whisper_loss=0.09213, over 3928325.99 frames. ], batch size: 61, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 17:59:39,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2245810.0, ans=0.0 2024-08-13 17:59:40,467 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 17:59:53,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-13 17:59:56,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2245910.0, ans=0.125 2024-08-13 18:00:05,071 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 18:00:05,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-13 18:00:22,952 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 18:00:32,954 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-13 18:00:44,719 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:00:50,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2024-08-13 18:00:53,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7250, loss[loss=0.07082, beats_loss=0.01303, ecapa_loss=0.0001601, whisper_loss=0.05619, over 20910.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001628, whisper_loss=0.09214, over 3922908.57 frames. ], batch size: 85, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:00:55,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2246310.0, ans=0.0 2024-08-13 18:00:57,173 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 18:01:15,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.504e+01 2.815e+01 3.088e+01 1.145e+02, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 18:01:15,615 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 18:01:16,779 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 18:01:38,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2246510.0, ans=0.0 2024-08-13 18:02:15,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7300, loss[loss=0.09956, beats_loss=0.01144, ecapa_loss=0.000173, whisper_loss=0.08639, over 14676.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01081, ecapa_loss=0.0001625, whisper_loss=0.09198, over 3915153.95 frames. ], batch size: 61, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:02:17,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2024-08-13 18:02:35,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-13 18:02:43,115 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 18:02:51,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2247010.0, ans=0.05 2024-08-13 18:02:58,230 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:02:59,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-13 18:03:01,315 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 18:03:03,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2247110.0, ans=0.035 2024-08-13 18:03:26,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2247210.0, ans=15.0 2024-08-13 18:03:27,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2247210.0, ans=0.0 2024-08-13 18:03:32,164 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 18:03:32,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2247210.0, ans=0.125 2024-08-13 18:03:36,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7350, loss[loss=0.1189, beats_loss=0.01219, ecapa_loss=0.000119, whisper_loss=0.1056, over 17626.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001613, whisper_loss=0.09163, over 3896317.06 frames. ], batch size: 68, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:03:55,911 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 18:03:58,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.425e+01 2.697e+01 3.109e+01 4.252e+01, threshold=5.395e+01, percent-clipped=0.0 2024-08-13 18:04:17,462 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 18:04:20,543 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 18:04:32,140 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 18:04:34,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.43 vs. limit=15.0 2024-08-13 18:04:46,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2247710.0, ans=0.2 2024-08-13 18:04:59,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7400, loss[loss=0.08012, beats_loss=0.01225, ecapa_loss=0.0001527, whisper_loss=0.06634, over 16187.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01078, ecapa_loss=0.0001624, whisper_loss=0.09174, over 3878311.73 frames. ], batch size: 66, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:05:04,536 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 18:05:24,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-08-13 18:05:43,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2248010.0, ans=0.0 2024-08-13 18:05:53,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2248110.0, ans=0.0 2024-08-13 18:05:58,012 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 18:06:17,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7450, loss[loss=0.1227, beats_loss=0.009222, ecapa_loss=0.0001472, whisper_loss=0.112, over 19248.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.000162, whisper_loss=0.09143, over 3852984.03 frames. ], batch size: 73, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:06:24,198 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 18:06:25,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2024-08-13 18:06:30,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-13 18:06:35,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-13 18:06:37,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.475e+01 2.713e+01 3.023e+01 4.384e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-13 18:06:49,045 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-13 18:06:52,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2248510.0, ans=0.0 2024-08-13 18:07:37,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7500, loss[loss=0.09385, beats_loss=0.01171, ecapa_loss=0.0001725, whisper_loss=0.08041, over 22299.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001628, whisper_loss=0.09148, over 3858529.92 frames. ], batch size: 92, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:07:42,339 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 18:07:58,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2248910.0, ans=0.125 2024-08-13 18:08:11,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2249010.0, ans=0.1 2024-08-13 18:08:25,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2249010.0, ans=0.125 2024-08-13 18:08:34,330 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 18:08:38,321 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 18:08:46,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2249210.0, ans=0.95 2024-08-13 18:08:51,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-13 18:08:52,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2249210.0, ans=0.125 2024-08-13 18:08:58,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7550, loss[loss=0.08271, beats_loss=0.01034, ecapa_loss=0.0001639, whisper_loss=0.07073, over 17399.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0108, ecapa_loss=0.0001629, whisper_loss=0.0907, over 3845882.02 frames. ], batch size: 70, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:09:03,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2249310.0, ans=0.95 2024-08-13 18:09:11,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2249310.0, ans=0.0 2024-08-13 18:09:16,220 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 18:09:19,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.370e+01 2.691e+01 3.011e+01 5.049e+01, threshold=5.381e+01, percent-clipped=0.0 2024-08-13 18:09:32,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-13 18:09:49,768 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 18:10:01,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2249710.0, ans=0.0 2024-08-13 18:10:03,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2249710.0, ans=0.0 2024-08-13 18:10:06,755 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:10:08,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-13 18:10:14,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2249710.0, ans=0.2 2024-08-13 18:10:17,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7600, loss[loss=0.1085, beats_loss=0.01165, ecapa_loss=0.000156, whisper_loss=0.09527, over 17919.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001632, whisper_loss=0.09077, over 3823278.29 frames. ], batch size: 73, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:10:22,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2249810.0, ans=0.0 2024-08-13 18:10:31,417 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 18:10:33,457 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 18:10:33,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2249910.0, ans=0.1 2024-08-13 18:10:36,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2249910.0, ans=0.0 2024-08-13 18:10:38,214 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 13 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 18:10:39,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2249910.0, ans=0.125 2024-08-13 18:10:47,391 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 18:10:52,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-13 18:10:58,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2250010.0, ans=0.125 2024-08-13 18:11:09,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2250110.0, ans=0.125 2024-08-13 18:11:24,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2250210.0, ans=0.1 2024-08-13 18:11:28,917 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 18:11:32,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2250210.0, ans=0.0 2024-08-13 18:11:36,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-08-13 18:11:37,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2250310.0, ans=0.125 2024-08-13 18:11:38,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7650, loss[loss=0.0906, beats_loss=0.008861, ecapa_loss=0.000175, whisper_loss=0.07999, over 13795.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001643, whisper_loss=0.0908, over 3817340.88 frames. ], batch size: 54, lr: 3.92e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:11:38,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2250310.0, ans=0.0 2024-08-13 18:11:57,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.418e+01 2.664e+01 3.048e+01 4.401e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 18:11:58,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2250410.0, ans=0.125 2024-08-13 18:12:01,156 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 18:12:08,177 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 18:12:24,881 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 18:12:27,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2250610.0, ans=0.0 2024-08-13 18:12:33,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2250610.0, ans=0.125 2024-08-13 18:12:48,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2250710.0, ans=0.125 2024-08-13 18:12:52,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7700, loss[loss=0.1011, beats_loss=0.009915, ecapa_loss=0.0001598, whisper_loss=0.08958, over 15926.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.000164, whisper_loss=0.09082, over 3865620.86 frames. ], batch size: 62, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:12:55,733 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 18:12:59,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2250810.0, ans=0.125 2024-08-13 18:13:01,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2250810.0, ans=0.125 2024-08-13 18:13:01,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2024-08-13 18:13:06,606 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 18:13:11,467 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 18:13:25,876 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.687e-01 2024-08-13 18:13:27,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2251010.0, ans=0.1 2024-08-13 18:13:31,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2251010.0, ans=0.125 2024-08-13 18:13:50,642 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 18:13:53,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-13 18:14:05,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7750, loss[loss=0.1297, beats_loss=0.008833, ecapa_loss=0.0001854, whisper_loss=0.119, over 22387.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001629, whisper_loss=0.09169, over 3883466.54 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:14:10,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2251310.0, ans=0.0 2024-08-13 18:14:19,104 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 18:14:20,701 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 18:14:22,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2251410.0, ans=0.0 2024-08-13 18:14:24,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.510e+01 2.727e+01 3.092e+01 1.354e+02, threshold=5.455e+01, percent-clipped=2.0 2024-08-13 18:14:24,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2251410.0, ans=0.125 2024-08-13 18:14:26,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-13 18:14:40,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2251510.0, ans=0.0 2024-08-13 18:14:46,046 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 18:15:00,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-13 18:15:01,261 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 18:15:08,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2251710.0, ans=0.1 2024-08-13 18:15:10,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-13 18:15:17,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7800, loss[loss=0.1133, beats_loss=0.01071, ecapa_loss=0.0001397, whisper_loss=0.1012, over 23421.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001626, whisper_loss=0.09154, over 3897585.54 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:15:30,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2251810.0, ans=0.0 2024-08-13 18:15:36,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2251910.0, ans=0.0 2024-08-13 18:15:46,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-13 18:15:46,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-13 18:15:55,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2252010.0, ans=0.1 2024-08-13 18:15:58,027 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 18:15:58,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2252010.0, ans=0.1 2024-08-13 18:16:30,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7850, loss[loss=0.098, beats_loss=0.009536, ecapa_loss=0.0002064, whisper_loss=0.0864, over 20083.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001623, whisper_loss=0.09206, over 3914927.07 frames. ], batch size: 84, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:16:48,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.348e+01 2.635e+01 3.053e+01 4.732e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 18:16:52,123 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 18:16:54,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.73 vs. limit=12.0 2024-08-13 18:16:55,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2252410.0, ans=0.1 2024-08-13 18:17:09,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2252510.0, ans=0.1 2024-08-13 18:17:32,538 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 18:17:34,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-13 18:17:44,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7900, loss[loss=0.1059, beats_loss=0.01148, ecapa_loss=0.0001694, whisper_loss=0.09275, over 21462.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01077, ecapa_loss=0.0001615, whisper_loss=0.09268, over 3892801.48 frames. ], batch size: 85, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:17:47,256 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-13 18:17:56,963 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 18:18:01,099 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 18:18:02,586 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 18:18:14,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2253010.0, ans=0.125 2024-08-13 18:18:20,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2253010.0, ans=0.0 2024-08-13 18:18:44,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2253210.0, ans=0.0 2024-08-13 18:18:57,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 7950, loss[loss=0.08871, beats_loss=0.01055, ecapa_loss=0.0001372, whisper_loss=0.07679, over 21582.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001611, whisper_loss=0.09253, over 3878948.28 frames. ], batch size: 87, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:19:06,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2253310.0, ans=0.0 2024-08-13 18:19:10,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2253410.0, ans=0.1 2024-08-13 18:19:10,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2253410.0, ans=0.0 2024-08-13 18:19:15,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2253410.0, ans=0.125 2024-08-13 18:19:15,899 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.363e+01 2.645e+01 3.044e+01 5.205e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-13 18:19:24,136 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-13 18:19:24,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2253410.0, ans=0.125 2024-08-13 18:19:29,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2253510.0, ans=0.125 2024-08-13 18:19:31,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2253510.0, ans=0.125 2024-08-13 18:19:36,107 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 18:19:43,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2253610.0, ans=0.125 2024-08-13 18:19:44,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2253610.0, ans=0.1 2024-08-13 18:19:51,815 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 18:20:01,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2253710.0, ans=0.125 2024-08-13 18:20:04,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2253710.0, ans=0.1 2024-08-13 18:20:13,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8000, loss[loss=0.1175, beats_loss=0.01013, ecapa_loss=0.0001622, whisper_loss=0.1057, over 23790.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0001594, whisper_loss=0.09292, over 3895386.17 frames. ], batch size: 93, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:20:26,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2253810.0, ans=0.2 2024-08-13 18:20:36,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-13 18:20:41,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2253910.0, ans=0.07 2024-08-13 18:20:53,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2254010.0, ans=0.125 2024-08-13 18:20:59,002 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 18:21:12,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2254210.0, ans=0.125 2024-08-13 18:21:19,459 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 18:21:26,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8050, loss[loss=0.09934, beats_loss=0.009475, ecapa_loss=0.0001569, whisper_loss=0.0883, over 17409.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01074, ecapa_loss=0.0001609, whisper_loss=0.09276, over 3870694.12 frames. ], batch size: 69, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:21:46,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.332e+01 2.558e+01 3.003e+01 5.582e+01, threshold=5.115e+01, percent-clipped=1.0 2024-08-13 18:21:52,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2254410.0, ans=0.0 2024-08-13 18:22:06,517 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 18:22:09,260 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 18:22:13,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2254610.0, ans=0.125 2024-08-13 18:22:31,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-13 18:22:32,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2254710.0, ans=0.125 2024-08-13 18:22:38,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8100, loss[loss=0.07391, beats_loss=0.01533, ecapa_loss=0.0001289, whisper_loss=0.05729, over 21604.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01071, ecapa_loss=0.0001614, whisper_loss=0.09254, over 3881671.42 frames. ], batch size: 87, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:22:44,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2254810.0, ans=0.0 2024-08-13 18:23:08,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2024-08-13 18:23:15,397 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 18:23:17,099 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 18:23:49,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8150, loss[loss=0.1113, beats_loss=0.01144, ecapa_loss=0.0001174, whisper_loss=0.09868, over 22812.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01073, ecapa_loss=0.0001613, whisper_loss=0.09225, over 3913321.70 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:24:09,230 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.440e+01 2.841e+01 3.164e+01 5.500e+01, threshold=5.681e+01, percent-clipped=1.0 2024-08-13 18:24:27,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2255510.0, ans=0.125 2024-08-13 18:24:30,666 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 18:24:36,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2255610.0, ans=0.2 2024-08-13 18:25:00,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2255710.0, ans=15.0 2024-08-13 18:25:01,386 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 18:25:02,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8200, loss[loss=0.11, beats_loss=0.01059, ecapa_loss=0.0001663, whisper_loss=0.09772, over 18982.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01072, ecapa_loss=0.0001618, whisper_loss=0.09201, over 3909216.68 frames. ], batch size: 77, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:25:13,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2255810.0, ans=0.125 2024-08-13 18:25:17,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2255910.0, ans=0.125 2024-08-13 18:25:19,015 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.507e-01 2024-08-13 18:25:21,272 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 18:25:29,997 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-13 18:25:32,897 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 18:25:38,103 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 18:25:43,356 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 18:25:47,841 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 14 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 18:25:54,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2256110.0, ans=0.125 2024-08-13 18:26:14,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8250, loss[loss=0.1024, beats_loss=0.01263, ecapa_loss=0.0001578, whisper_loss=0.08823, over 21683.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.0001617, whisper_loss=0.09103, over 3877112.71 frames. ], batch size: 89, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:26:18,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2256310.0, ans=0.1 2024-08-13 18:26:22,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2256310.0, ans=0.0 2024-08-13 18:26:32,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.303e+01 2.576e+01 2.826e+01 3.811e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-13 18:26:51,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2256510.0, ans=0.125 2024-08-13 18:27:01,097 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 18:27:06,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-08-13 18:27:17,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-08-13 18:27:24,815 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04663357511162758, model_norm_threshold=51.52228546142578 2024-08-13 18:27:25,050 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.95, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.156e+06, grad_sumsq=1.333e+05, orig_rms_sq=8.675e+00 2024-08-13 18:27:25,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8300, loss[loss=0.1007, beats_loss=0.01332, ecapa_loss=0.0001538, whisper_loss=0.08581, over 18821.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001601, whisper_loss=0.09098, over 3904963.88 frames. ], batch size: 78, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:27:33,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2256810.0, ans=0.125 2024-08-13 18:27:42,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2256910.0, ans=0.125 2024-08-13 18:27:44,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2256910.0, ans=0.0 2024-08-13 18:27:44,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2256910.0, ans=0.125 2024-08-13 18:27:46,870 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 18:28:00,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2257010.0, ans=0.125 2024-08-13 18:28:07,116 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 18:28:11,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2257110.0, ans=0.2 2024-08-13 18:28:14,564 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-13 18:28:14,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2257110.0, ans=0.0 2024-08-13 18:28:30,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8350, loss[loss=0.1213, beats_loss=0.008802, ecapa_loss=0.000155, whisper_loss=0.1109, over 21617.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.09153, over 3895466.37 frames. ], batch size: 81, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:28:46,847 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 18:28:47,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.502e+01 2.789e+01 3.217e+01 1.105e+03, threshold=5.579e+01, percent-clipped=3.0 2024-08-13 18:28:58,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2257510.0, ans=0.0 2024-08-13 18:29:06,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2257510.0, ans=0.125 2024-08-13 18:29:25,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2257710.0, ans=0.0 2024-08-13 18:29:27,039 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 18:29:35,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-08-13 18:29:35,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8400, loss[loss=0.1082, beats_loss=0.0115, ecapa_loss=0.0001889, whisper_loss=0.09485, over 20651.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001616, whisper_loss=0.09182, over 3875470.19 frames. ], batch size: 88, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:29:48,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-13 18:30:02,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2258010.0, ans=0.125 2024-08-13 18:30:34,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-13 18:30:35,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2258210.0, ans=0.2 2024-08-13 18:30:36,276 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-13 18:30:36,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2258210.0, ans=0.0 2024-08-13 18:30:42,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8450, loss[loss=0.09249, beats_loss=0.01335, ecapa_loss=0.0001559, whisper_loss=0.07758, over 22759.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01072, ecapa_loss=0.000162, whisper_loss=0.09228, over 3896353.23 frames. ], batch size: 93, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:30:59,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.525e+01 2.749e+01 3.077e+01 1.697e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 18:31:30,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2258610.0, ans=0.04949747468305833 2024-08-13 18:31:31,265 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 18:31:34,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2258710.0, ans=0.125 2024-08-13 18:31:37,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2258710.0, ans=0.0 2024-08-13 18:31:37,958 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 18:31:48,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8500, loss[loss=0.09627, beats_loss=0.0121, ecapa_loss=0.0001444, whisper_loss=0.08272, over 22471.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001617, whisper_loss=0.09223, over 3931953.59 frames. ], batch size: 90, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:31:48,514 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 18:31:55,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2258810.0, ans=0.95 2024-08-13 18:32:01,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2258910.0, ans=0.125 2024-08-13 18:32:16,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2024-08-13 18:32:33,376 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 18:32:51,777 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 8 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 18:32:54,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8550, loss[loss=0.1258, beats_loss=0.008938, ecapa_loss=0.0001739, whisper_loss=0.1151, over 14017.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.000162, whisper_loss=0.0922, over 3918685.57 frames. ], batch size: 56, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:32:59,656 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 18:33:00,928 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 18:33:10,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.379e+01 2.646e+01 2.938e+01 4.520e+01, threshold=5.292e+01, percent-clipped=0.0 2024-08-13 18:33:16,100 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 18:33:23,002 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:33:32,240 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 18:33:36,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2259610.0, ans=0.2 2024-08-13 18:33:55,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2259710.0, ans=0.2 2024-08-13 18:33:56,369 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 18:33:58,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8600, loss[loss=0.1016, beats_loss=0.01349, ecapa_loss=0.000131, whisper_loss=0.08676, over 20878.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01076, ecapa_loss=0.0001618, whisper_loss=0.09208, over 3883399.61 frames. ], batch size: 84, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:33:59,157 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 18:34:17,120 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 18:34:24,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.09 vs. limit=22.5 2024-08-13 18:34:27,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2260010.0, ans=0.125 2024-08-13 18:34:42,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2260110.0, ans=0.125 2024-08-13 18:34:42,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2024-08-13 18:35:06,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8650, loss[loss=0.08057, beats_loss=0.01226, ecapa_loss=0.0001811, whisper_loss=0.0665, over 20434.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001617, whisper_loss=0.09137, over 3890290.47 frames. ], batch size: 87, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:35:12,751 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 18:35:18,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-13 18:35:19,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2260410.0, ans=0.125 2024-08-13 18:35:24,162 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.420e+01 2.581e+01 2.912e+01 4.652e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-13 18:35:38,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-13 18:35:41,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2260510.0, ans=0.0 2024-08-13 18:35:49,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2024-08-13 18:36:05,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2260710.0, ans=0.125 2024-08-13 18:36:07,529 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 18:36:13,636 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 18:36:14,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8700, loss[loss=0.08949, beats_loss=0.01095, ecapa_loss=0.0001836, whisper_loss=0.07671, over 16410.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001614, whisper_loss=0.09145, over 3901529.78 frames. ], batch size: 67, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:36:17,273 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 18:36:31,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=10.0 2024-08-13 18:36:42,180 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 18:36:51,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2261010.0, ans=0.0 2024-08-13 18:37:05,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2261110.0, ans=0.2 2024-08-13 18:37:32,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8750, loss[loss=0.09754, beats_loss=0.01059, ecapa_loss=0.0001558, whisper_loss=0.08539, over 16118.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001615, whisper_loss=0.09084, over 3842686.19 frames. ], batch size: 64, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:37:40,901 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 18:37:50,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.392e+01 2.713e+01 3.025e+01 4.261e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 18:38:08,817 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 18:38:34,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2024-08-13 18:38:37,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2261710.0, ans=0.1 2024-08-13 18:38:38,623 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 18:38:54,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8800, loss[loss=0.1071, beats_loss=0.01208, ecapa_loss=0.0001394, whisper_loss=0.0936, over 24031.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01101, ecapa_loss=0.00016, whisper_loss=0.09019, over 3870046.50 frames. ], batch size: 92, lr: 3.91e-03, grad_scale: 5.764607523034235e+17 2024-08-13 18:39:07,244 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 18:39:12,996 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 18:39:36,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2262010.0, ans=0.2 2024-08-13 18:39:56,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2262110.0, ans=0.125 2024-08-13 18:40:19,603 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 18:40:26,520 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 18:40:27,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.92 vs. limit=22.5 2024-08-13 18:40:33,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8850, loss[loss=0.0939, beats_loss=0.0122, ecapa_loss=0.0001465, whisper_loss=0.08024, over 15974.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01105, ecapa_loss=0.0001605, whisper_loss=0.08996, over 3869724.51 frames. ], batch size: 62, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:40:38,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2262310.0, ans=0.1 2024-08-13 18:40:57,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.612e+01 3.083e+01 5.604e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 18:41:22,233 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 18:41:22,514 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-13 18:41:25,067 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 5 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 18:41:28,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2262510.0, ans=0.125 2024-08-13 18:41:38,822 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 18:41:39,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2262610.0, ans=0.0 2024-08-13 18:41:43,135 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 18:42:09,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8900, loss[loss=0.09603, beats_loss=0.01098, ecapa_loss=0.0001287, whisper_loss=0.08377, over 16192.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01105, ecapa_loss=0.0001602, whisper_loss=0.08984, over 3865552.43 frames. ], batch size: 62, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:42:23,199 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 18:42:43,479 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.318e-02 2024-08-13 18:42:43,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2262910.0, ans=0.125 2024-08-13 18:42:55,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-08-13 18:43:06,962 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 18:43:07,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2263110.0, ans=0.0 2024-08-13 18:43:07,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-13 18:43:11,796 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 18:43:17,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2263110.0, ans=0.125 2024-08-13 18:43:21,831 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 18:43:33,202 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 8950, loss[loss=0.1028, beats_loss=0.01126, ecapa_loss=0.0001557, whisper_loss=0.08994, over 22030.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01101, ecapa_loss=0.0001595, whisper_loss=0.09006, over 3856481.89 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:43:37,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-13 18:43:45,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2263410.0, ans=0.2 2024-08-13 18:43:49,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.368e+01 2.604e+01 2.902e+01 4.386e+01, threshold=5.207e+01, percent-clipped=0.0 2024-08-13 18:43:57,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2263510.0, ans=0.0 2024-08-13 18:44:18,986 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 18:44:22,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-13 18:44:24,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2263710.0, ans=0.125 2024-08-13 18:44:29,304 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-13 18:44:37,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2263810.0, ans=0.0 2024-08-13 18:44:38,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9000, loss[loss=0.1056, beats_loss=0.01019, ecapa_loss=0.000182, whisper_loss=0.09362, over 20916.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01099, ecapa_loss=0.0001604, whisper_loss=0.09065, over 3866141.71 frames. ], batch size: 84, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:44:38,236 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 18:45:17,668 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005571, whisper_loss=0.2482, over 922467.00 frames. 2024-08-13 18:45:37,369 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.004514, beats_loss=0, ecapa_loss=0.0004514, whisper_loss=0, over 939242.00 frames. 2024-08-13 18:47:39,620 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 18:47:39,623 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 18:47:55,411 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.450e-01 2024-08-13 18:47:59,694 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 18:48:01,091 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 18:48:18,392 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 18:48:21,430 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 18:48:34,997 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-13 18:48:38,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2264210.0, ans=0.125 2024-08-13 18:48:47,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9050, loss[loss=0.1148, beats_loss=0.009644, ecapa_loss=0.000181, whisper_loss=0.1033, over 22553.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001615, whisper_loss=0.09118, over 3848590.08 frames. ], batch size: 91, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:49:02,907 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 18:49:05,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.411e+01 2.657e+01 3.042e+01 5.076e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-13 18:49:28,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2264610.0, ans=0.2 2024-08-13 18:49:35,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2264610.0, ans=0.0 2024-08-13 18:49:49,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2264710.0, ans=0.1 2024-08-13 18:49:52,038 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 18:49:56,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2264810.0, ans=0.2 2024-08-13 18:49:57,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9100, loss[loss=0.08682, beats_loss=0.01215, ecapa_loss=0.0001355, whisper_loss=0.07331, over 16806.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001611, whisper_loss=0.09189, over 3873384.24 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:50:17,143 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.009e+01 2024-08-13 18:50:17,997 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-13 18:50:22,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-13 18:50:23,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2265010.0, ans=0.125 2024-08-13 18:50:45,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2265110.0, ans=0.2 2024-08-13 18:50:46,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2265110.0, ans=0.07 2024-08-13 18:50:55,525 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 18:50:59,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2265210.0, ans=0.125 2024-08-13 18:51:00,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2024-08-13 18:51:06,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2265310.0, ans=0.125 2024-08-13 18:51:07,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9150, loss[loss=0.06314, beats_loss=0.01613, ecapa_loss=0.0001746, whisper_loss=0.04527, over 17884.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01076, ecapa_loss=0.0001611, whisper_loss=0.0926, over 3918907.08 frames. ], batch size: 77, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:51:26,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.406e+01 2.793e+01 3.104e+01 4.161e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 18:51:34,178 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 18:51:39,856 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-13 18:51:40,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2265510.0, ans=0.125 2024-08-13 18:51:51,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2265610.0, ans=0.0 2024-08-13 18:51:55,214 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 18:51:59,237 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 26 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-13 18:52:08,439 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 18:52:15,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-13 18:52:16,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2265810.0, ans=0.1 2024-08-13 18:52:17,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9200, loss[loss=0.1109, beats_loss=0.009973, ecapa_loss=0.0001482, whisper_loss=0.09943, over 20047.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001599, whisper_loss=0.09178, over 3914581.54 frames. ], batch size: 79, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:52:23,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2265810.0, ans=0.0 2024-08-13 18:52:29,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.14 vs. limit=10.0 2024-08-13 18:52:59,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-13 18:53:00,096 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 18:53:00,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-13 18:53:02,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2266110.0, ans=0.0 2024-08-13 18:53:10,531 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 18:53:24,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9250, loss[loss=0.088, beats_loss=0.009468, ecapa_loss=0.000212, whisper_loss=0.07642, over 14002.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001604, whisper_loss=0.09157, over 3915534.65 frames. ], batch size: 59, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:53:36,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2266410.0, ans=0.2 2024-08-13 18:53:38,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2266410.0, ans=0.0 2024-08-13 18:53:41,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.413e+01 2.566e+01 3.082e+01 5.176e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-13 18:53:44,837 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 31 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 18:53:46,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2266410.0, ans=0.125 2024-08-13 18:53:50,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-08-13 18:53:51,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2266510.0, ans=0.09899494936611666 2024-08-13 18:53:57,159 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 18:53:58,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=8.0 2024-08-13 18:54:03,678 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 18 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 18:54:03,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2266610.0, ans=0.1 2024-08-13 18:54:14,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2266610.0, ans=0.0 2024-08-13 18:54:32,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9300, loss[loss=0.1106, beats_loss=0.01241, ecapa_loss=0.0001343, whisper_loss=0.09684, over 23393.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001607, whisper_loss=0.0918, over 3913086.07 frames. ], batch size: 91, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:54:32,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2266810.0, ans=0.0 2024-08-13 18:54:49,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2266910.0, ans=0.125 2024-08-13 18:54:53,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2266910.0, ans=0.07 2024-08-13 18:54:55,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=12.0 2024-08-13 18:55:11,133 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 18:55:22,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2267110.0, ans=0.1 2024-08-13 18:55:23,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-13 18:55:23,856 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 18:55:25,318 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-13 18:55:41,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9350, loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.000157, whisper_loss=0.09193, over 22582.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001624, whisper_loss=0.09139, over 3882434.08 frames. ], batch size: 91, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:55:58,975 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.391e+01 2.659e+01 2.911e+01 1.966e+02, threshold=5.317e+01, percent-clipped=2.0 2024-08-13 18:56:37,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2267710.0, ans=0.1 2024-08-13 18:56:41,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2267710.0, ans=0.125 2024-08-13 18:56:44,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2267710.0, ans=0.125 2024-08-13 18:56:44,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2267710.0, ans=0.125 2024-08-13 18:56:48,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9400, loss[loss=0.094, beats_loss=0.01276, ecapa_loss=0.0001147, whisper_loss=0.0801, over 18160.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01083, ecapa_loss=0.0001623, whisper_loss=0.09043, over 3882895.25 frames. ], batch size: 72, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:56:55,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2267810.0, ans=0.0 2024-08-13 18:57:02,208 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 18:57:02,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2267910.0, ans=0.0 2024-08-13 18:57:08,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2024-08-13 18:57:13,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-13 18:57:40,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2268210.0, ans=0.125 2024-08-13 18:57:41,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.25 vs. limit=15.0 2024-08-13 18:57:54,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9450, loss[loss=0.1008, beats_loss=0.008528, ecapa_loss=0.0001985, whisper_loss=0.09026, over 23161.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.09025, over 3877196.49 frames. ], batch size: 95, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:57:56,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-13 18:57:59,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2024-08-13 18:58:02,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2268310.0, ans=0.125 2024-08-13 18:58:07,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2268410.0, ans=0.1 2024-08-13 18:58:08,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2268410.0, ans=0.125 2024-08-13 18:58:12,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.364e+01 2.605e+01 2.951e+01 9.303e+01, threshold=5.211e+01, percent-clipped=2.0 2024-08-13 18:58:15,997 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 34 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 18:58:35,239 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 18:58:41,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2268610.0, ans=0.0 2024-08-13 18:58:48,676 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 18:58:51,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2268710.0, ans=0.1 2024-08-13 18:58:53,871 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 18:58:54,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2268710.0, ans=0.0 2024-08-13 18:58:56,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2268710.0, ans=0.125 2024-08-13 18:59:00,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9500, loss[loss=0.1237, beats_loss=0.01116, ecapa_loss=0.0001329, whisper_loss=0.1112, over 23478.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001631, whisper_loss=0.09132, over 3894809.06 frames. ], batch size: 91, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 18:59:10,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2268810.0, ans=0.0 2024-08-13 18:59:20,649 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-13 18:59:38,435 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 18:59:55,208 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 18:59:55,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2269210.0, ans=0.125 2024-08-13 18:59:57,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.77 vs. limit=22.5 2024-08-13 18:59:59,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2269210.0, ans=0.0 2024-08-13 19:00:06,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9550, loss[loss=0.1002, beats_loss=0.01196, ecapa_loss=0.000129, whisper_loss=0.08692, over 23799.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001636, whisper_loss=0.09143, over 3857881.36 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:00:16,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2269310.0, ans=0.0 2024-08-13 19:00:18,789 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-13 19:00:19,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2269410.0, ans=0.05 2024-08-13 19:00:21,416 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 19:00:23,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.300e+01 2.521e+01 2.795e+01 4.846e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-13 19:00:37,005 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 19:00:59,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2269710.0, ans=0.0 2024-08-13 19:01:01,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2269710.0, ans=0.125 2024-08-13 19:01:08,985 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 19:01:11,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9600, loss[loss=0.1115, beats_loss=0.01143, ecapa_loss=0.0001522, whisper_loss=0.09851, over 22747.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001622, whisper_loss=0.0908, over 3826008.18 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:01:17,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2269810.0, ans=0.0 2024-08-13 19:01:22,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2269810.0, ans=0.2 2024-08-13 19:01:44,321 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 19:01:46,788 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 19:01:55,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2270110.0, ans=0.1 2024-08-13 19:02:05,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2270210.0, ans=0.125 2024-08-13 19:02:10,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-08-13 19:02:11,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2270210.0, ans=0.04949747468305833 2024-08-13 19:02:16,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9650, loss[loss=0.1203, beats_loss=0.009541, ecapa_loss=0.000156, whisper_loss=0.1092, over 17260.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.0001622, whisper_loss=0.0905, over 3816857.24 frames. ], batch size: 66, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:02:19,566 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 19:02:27,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 19:02:33,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.348e+01 2.592e+01 2.887e+01 4.146e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-13 19:02:36,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2270410.0, ans=0.1 2024-08-13 19:02:39,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2270410.0, ans=0.125 2024-08-13 19:02:57,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2270610.0, ans=0.0 2024-08-13 19:02:58,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-08-13 19:03:10,189 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 19:03:21,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9700, loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001679, whisper_loss=0.08961, over 21511.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.000162, whisper_loss=0.09042, over 3803103.88 frames. ], batch size: 89, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:03:31,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2270810.0, ans=0.0 2024-08-13 19:03:39,080 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 19:03:39,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=2270910.0, ans=12.0 2024-08-13 19:03:40,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2270910.0, ans=0.1 2024-08-13 19:03:40,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2270910.0, ans=0.125 2024-08-13 19:03:41,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2270910.0, ans=0.1 2024-08-13 19:03:50,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2271010.0, ans=0.0 2024-08-13 19:04:01,055 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 19:04:13,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2271210.0, ans=0.125 2024-08-13 19:04:24,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2271210.0, ans=0.1 2024-08-13 19:04:26,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9750, loss[loss=0.09364, beats_loss=0.01181, ecapa_loss=0.0002014, whisper_loss=0.07981, over 19678.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001617, whisper_loss=0.0906, over 3844307.36 frames. ], batch size: 93, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:04:36,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2271310.0, ans=0.2 2024-08-13 19:04:39,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2271410.0, ans=0.0 2024-08-13 19:04:40,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2271410.0, ans=0.125 2024-08-13 19:04:43,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.462e+01 2.717e+01 3.058e+01 5.863e+01, threshold=5.433e+01, percent-clipped=1.0 2024-08-13 19:04:46,912 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:04:52,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2271510.0, ans=0.125 2024-08-13 19:05:01,078 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-13 19:05:07,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2271610.0, ans=0.1 2024-08-13 19:05:32,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9800, loss[loss=0.1055, beats_loss=0.009179, ecapa_loss=0.0001893, whisper_loss=0.09443, over 22197.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001616, whisper_loss=0.09088, over 3826228.70 frames. ], batch size: 90, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:08,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-08-13 19:06:14,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2272110.0, ans=0.1 2024-08-13 19:06:16,496 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-13 19:06:28,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2272210.0, ans=0.125 2024-08-13 19:06:37,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9850, loss[loss=0.1091, beats_loss=0.01242, ecapa_loss=0.000121, whisper_loss=0.09545, over 21824.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001616, whisper_loss=0.0909, over 3850143.97 frames. ], batch size: 85, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:06:49,314 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 19:06:49,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2272410.0, ans=0.125 2024-08-13 19:06:50,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2272410.0, ans=0.0 2024-08-13 19:06:54,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.358e+01 2.690e+01 3.043e+01 6.098e+01, threshold=5.380e+01, percent-clipped=1.0 2024-08-13 19:06:57,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2272410.0, ans=0.125 2024-08-13 19:06:57,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2272410.0, ans=0.1 2024-08-13 19:07:02,752 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 19:07:06,759 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 19:07:07,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-13 19:07:18,307 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-13 19:07:27,273 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 19:07:42,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9900, loss[loss=0.1012, beats_loss=0.00935, ecapa_loss=0.0001843, whisper_loss=0.09003, over 20586.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01084, ecapa_loss=0.0001605, whisper_loss=0.09132, over 3865062.39 frames. ], batch size: 85, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:08:02,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-13 19:08:16,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2273010.0, ans=0.0 2024-08-13 19:08:18,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-13 19:08:23,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=2273110.0, ans=12.0 2024-08-13 19:08:39,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2273210.0, ans=0.125 2024-08-13 19:08:42,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2273210.0, ans=0.125 2024-08-13 19:08:47,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 9950, loss[loss=0.1038, beats_loss=0.01113, ecapa_loss=0.0001167, whisper_loss=0.09151, over 18151.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01087, ecapa_loss=0.000161, whisper_loss=0.0905, over 3829623.45 frames. ], batch size: 68, lr: 3.90e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:08:48,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2273310.0, ans=0.07 2024-08-13 19:09:04,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.452e+01 2.692e+01 3.113e+01 1.874e+02, threshold=5.385e+01, percent-clipped=3.0 2024-08-13 19:09:07,796 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 19:09:10,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2273410.0, ans=0.1 2024-08-13 19:09:11,673 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 19:09:12,790 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 19:09:23,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-13 19:09:28,417 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 32 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 19:09:28,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2273610.0, ans=0.0 2024-08-13 19:09:48,096 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 19:09:49,599 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 19:09:52,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10000, loss[loss=0.09841, beats_loss=0.008827, ecapa_loss=0.0001582, whisper_loss=0.088, over 16597.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001608, whisper_loss=0.09113, over 3876196.22 frames. ], batch size: 65, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:10:20,573 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 19:10:40,581 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 19:10:43,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2274210.0, ans=0.125 2024-08-13 19:10:47,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-08-13 19:10:57,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10050, loss[loss=0.09432, beats_loss=0.01054, ecapa_loss=0.0001454, whisper_loss=0.08232, over 19002.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001615, whisper_loss=0.09094, over 3859595.11 frames. ], batch size: 75, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:10:57,664 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-13 19:11:13,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2274410.0, ans=0.025 2024-08-13 19:11:14,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.462e+01 2.706e+01 3.125e+01 1.991e+02, threshold=5.413e+01, percent-clipped=1.0 2024-08-13 19:11:24,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2024-08-13 19:11:47,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2274710.0, ans=0.0 2024-08-13 19:11:56,180 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 19:12:01,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10100, loss[loss=0.09975, beats_loss=0.01215, ecapa_loss=0.0001715, whisper_loss=0.08588, over 18295.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001626, whisper_loss=0.09077, over 3896576.42 frames. ], batch size: 75, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:12:08,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2274810.0, ans=0.125 2024-08-13 19:12:17,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2274910.0, ans=0.0 2024-08-13 19:12:18,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-13 19:12:24,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2274910.0, ans=0.0 2024-08-13 19:12:26,039 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 19:12:26,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2275010.0, ans=0.125 2024-08-13 19:12:34,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2275010.0, ans=0.0 2024-08-13 19:12:38,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2275010.0, ans=0.0 2024-08-13 19:12:46,071 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 19:13:06,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10150, loss[loss=0.1041, beats_loss=0.0113, ecapa_loss=0.0001565, whisper_loss=0.09122, over 23557.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001639, whisper_loss=0.09107, over 3895698.73 frames. ], batch size: 95, lr: 3.89e-03, grad_scale: 1.152921504606847e+18 2024-08-13 19:13:18,399 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 19:13:24,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.395e+01 2.644e+01 2.917e+01 4.595e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-13 19:13:26,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2275410.0, ans=0.0 2024-08-13 19:13:39,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2275510.0, ans=0.125 2024-08-13 19:13:47,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2275610.0, ans=0.0 2024-08-13 19:13:54,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2275610.0, ans=0.125 2024-08-13 19:13:55,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2275610.0, ans=0.1 2024-08-13 19:14:11,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2275710.0, ans=0.2 2024-08-13 19:14:15,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10200, loss[loss=0.114, beats_loss=0.01031, ecapa_loss=0.0001833, whisper_loss=0.1019, over 22751.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.0001641, whisper_loss=0.09089, over 3861041.48 frames. ], batch size: 91, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:14:36,760 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 19:14:39,868 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 19:14:43,075 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 19:14:51,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2276010.0, ans=0.125 2024-08-13 19:15:09,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-13 19:15:21,531 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 19:15:28,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2276210.0, ans=0.125 2024-08-13 19:15:31,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10250, loss[loss=0.09652, beats_loss=0.0124, ecapa_loss=0.0001191, whisper_loss=0.08293, over 21748.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001642, whisper_loss=0.09083, over 3863665.42 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:15:45,345 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 19:15:53,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.373e+01 2.735e+01 3.155e+01 5.239e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-13 19:16:03,841 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 19:16:08,892 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 19:16:24,811 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-13 19:16:26,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2276610.0, ans=0.1 2024-08-13 19:16:28,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-13 19:16:39,288 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 19:16:42,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2276710.0, ans=0.125 2024-08-13 19:16:46,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10300, loss[loss=0.07059, beats_loss=0.01178, ecapa_loss=0.0001378, whisper_loss=0.05744, over 19968.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001636, whisper_loss=0.09086, over 3884936.48 frames. ], batch size: 78, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:17:01,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2276910.0, ans=0.1 2024-08-13 19:17:23,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2277010.0, ans=0.125 2024-08-13 19:17:25,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2277010.0, ans=0.2 2024-08-13 19:17:25,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2277010.0, ans=0.2 2024-08-13 19:17:25,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2277010.0, ans=0.125 2024-08-13 19:17:48,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2277210.0, ans=0.125 2024-08-13 19:17:59,606 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 19:18:02,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2277210.0, ans=0.0 2024-08-13 19:18:03,449 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 19:18:04,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-08-13 19:18:04,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10350, loss[loss=0.09839, beats_loss=0.01139, ecapa_loss=0.0001698, whisper_loss=0.0853, over 22261.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001636, whisper_loss=0.09129, over 3921230.36 frames. ], batch size: 91, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:18:26,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.410e+01 2.741e+01 3.127e+01 1.313e+02, threshold=5.482e+01, percent-clipped=3.0 2024-08-13 19:18:31,114 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 19:19:08,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2277710.0, ans=0.0 2024-08-13 19:19:19,704 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 19:19:20,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-13 19:19:20,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10400, loss[loss=0.1055, beats_loss=0.008983, ecapa_loss=0.0001565, whisper_loss=0.09496, over 23344.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001635, whisper_loss=0.09142, over 3904509.62 frames. ], batch size: 90, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:19:30,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2277810.0, ans=0.125 2024-08-13 19:19:33,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2277810.0, ans=0.125 2024-08-13 19:19:44,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2277910.0, ans=0.2 2024-08-13 19:19:52,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2278010.0, ans=10.0 2024-08-13 19:20:12,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2278110.0, ans=0.0 2024-08-13 19:20:19,102 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-13 19:20:33,569 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10450, loss[loss=0.1028, beats_loss=0.009795, ecapa_loss=0.0001797, whisper_loss=0.0912, over 19227.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001623, whisper_loss=0.0916, over 3865342.47 frames. ], batch size: 79, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:20:50,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2278410.0, ans=0.125 2024-08-13 19:20:55,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.408e+01 2.686e+01 2.992e+01 7.083e+01, threshold=5.372e+01, percent-clipped=1.0 2024-08-13 19:21:07,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2278510.0, ans=0.125 2024-08-13 19:21:15,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2278510.0, ans=0.1 2024-08-13 19:21:36,264 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 19:21:36,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2278710.0, ans=0.1 2024-08-13 19:21:49,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10500, loss[loss=0.08412, beats_loss=0.01364, ecapa_loss=0.0001289, whisper_loss=0.0692, over 15273.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001637, whisper_loss=0.09149, over 3855845.30 frames. ], batch size: 61, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:21:57,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2278810.0, ans=0.0 2024-08-13 19:22:03,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-08-13 19:22:05,326 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 19:22:13,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2278910.0, ans=0.125 2024-08-13 19:22:36,603 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 19:22:53,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2279210.0, ans=0.125 2024-08-13 19:22:58,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2279210.0, ans=0.2 2024-08-13 19:23:05,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10550, loss[loss=0.08545, beats_loss=0.01166, ecapa_loss=0.0001761, whisper_loss=0.07203, over 21006.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001636, whisper_loss=0.09105, over 3855519.35 frames. ], batch size: 87, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:23:07,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2279310.0, ans=0.125 2024-08-13 19:23:29,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.431e+01 2.823e+01 3.244e+01 7.825e+01, threshold=5.646e+01, percent-clipped=1.0 2024-08-13 19:23:31,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=12.0 2024-08-13 19:23:37,431 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 19:23:37,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2279510.0, ans=0.04949747468305833 2024-08-13 19:23:52,425 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 19:24:10,046 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 19:24:19,477 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 19:24:26,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10600, loss[loss=0.08336, beats_loss=0.01371, ecapa_loss=0.0001241, whisper_loss=0.0684, over 17340.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01074, ecapa_loss=0.0001638, whisper_loss=0.0913, over 3852725.60 frames. ], batch size: 71, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:24:27,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2279810.0, ans=0.125 2024-08-13 19:24:44,332 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 19:24:48,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2279910.0, ans=0.2 2024-08-13 19:24:52,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2279910.0, ans=0.125 2024-08-13 19:24:56,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2279910.0, ans=0.07 2024-08-13 19:25:22,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2280110.0, ans=0.125 2024-08-13 19:25:23,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2280110.0, ans=22.5 2024-08-13 19:25:51,504 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10650, loss[loss=0.1155, beats_loss=0.009177, ecapa_loss=0.0001453, whisper_loss=0.1049, over 21622.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001624, whisper_loss=0.091, over 3846750.08 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:26:03,989 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:26:15,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.293e+01 2.581e+01 2.913e+01 4.333e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-13 19:26:15,551 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 19:26:27,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2280510.0, ans=0.0 2024-08-13 19:26:51,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2280610.0, ans=0.125 2024-08-13 19:27:07,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2280710.0, ans=0.2 2024-08-13 19:27:13,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10700, loss[loss=0.1162, beats_loss=0.009222, ecapa_loss=0.0001335, whisper_loss=0.1057, over 21948.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001612, whisper_loss=0.09193, over 3862248.95 frames. ], batch size: 82, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:27:16,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2280810.0, ans=0.125 2024-08-13 19:27:16,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=12.0 2024-08-13 19:27:33,259 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 19:27:43,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2280910.0, ans=0.0 2024-08-13 19:28:34,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10750, loss[loss=0.1262, beats_loss=0.008557, ecapa_loss=0.0001454, whisper_loss=0.1162, over 18378.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001618, whisper_loss=0.09191, over 3841733.71 frames. ], batch size: 68, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:28:57,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.479e+01 2.782e+01 3.163e+01 7.452e+01, threshold=5.564e+01, percent-clipped=1.0 2024-08-13 19:29:12,586 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 19:29:34,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-08-13 19:29:44,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2281710.0, ans=0.125 2024-08-13 19:29:55,872 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10800, loss[loss=0.09536, beats_loss=0.01192, ecapa_loss=0.0001584, whisper_loss=0.08186, over 13775.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.000161, whisper_loss=0.09187, over 3874192.10 frames. ], batch size: 54, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:29:56,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2281810.0, ans=0.0 2024-08-13 19:30:10,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2281910.0, ans=0.125 2024-08-13 19:30:22,917 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 19:30:44,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2282110.0, ans=0.125 2024-08-13 19:30:54,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2282110.0, ans=0.1 2024-08-13 19:31:10,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-13 19:31:11,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2282210.0, ans=0.125 2024-08-13 19:31:16,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10850, loss[loss=0.1115, beats_loss=0.01052, ecapa_loss=0.0001995, whisper_loss=0.09894, over 19293.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001617, whisper_loss=0.09246, over 3911033.53 frames. ], batch size: 77, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:31:29,038 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 19:31:31,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-13 19:31:38,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.579e+01 2.775e+01 3.149e+01 7.029e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 19:31:42,327 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 19:31:47,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2282510.0, ans=0.0 2024-08-13 19:31:59,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2282510.0, ans=0.125 2024-08-13 19:32:00,551 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 19:32:30,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2282710.0, ans=0.0 2024-08-13 19:32:40,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10900, loss[loss=0.103, beats_loss=0.00962, ecapa_loss=0.0001875, whisper_loss=0.09149, over 21786.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001621, whisper_loss=0.09259, over 3930946.40 frames. ], batch size: 92, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:32:52,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2282810.0, ans=0.125 2024-08-13 19:32:52,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2282810.0, ans=0.0 2024-08-13 19:33:02,971 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 19:33:06,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2282910.0, ans=0.2 2024-08-13 19:33:10,045 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 19:33:43,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2024-08-13 19:33:44,510 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 19:33:57,009 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 19:34:00,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 10950, loss[loss=0.08435, beats_loss=0.01206, ecapa_loss=0.0001448, whisper_loss=0.07083, over 16475.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01079, ecapa_loss=0.0001617, whisper_loss=0.0927, over 3951978.11 frames. ], batch size: 67, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:34:09,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2283310.0, ans=0.125 2024-08-13 19:34:13,964 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 19:34:23,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.358e+01 2.590e+01 2.815e+01 3.849e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-13 19:34:30,204 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 19:35:12,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2283710.0, ans=0.0 2024-08-13 19:35:22,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11000, loss[loss=0.1142, beats_loss=0.007858, ecapa_loss=0.0001892, whisper_loss=0.1045, over 19086.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01071, ecapa_loss=0.000163, whisper_loss=0.0929, over 3956171.11 frames. ], batch size: 76, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:36:20,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-08-13 19:36:23,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2284110.0, ans=0.125 2024-08-13 19:36:25,778 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 19:36:27,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2024-08-13 19:36:45,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11050, loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001444, whisper_loss=0.09244, over 24097.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001619, whisper_loss=0.09248, over 3977107.94 frames. ], batch size: 95, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:37:01,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2284410.0, ans=0.125 2024-08-13 19:37:08,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.460e+01 2.684e+01 3.016e+01 4.539e+01, threshold=5.368e+01, percent-clipped=0.0 2024-08-13 19:37:17,262 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 19:37:26,362 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 19:37:31,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2284510.0, ans=0.2 2024-08-13 19:37:32,737 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 19:37:37,336 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 19:38:00,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2284710.0, ans=0.0 2024-08-13 19:38:01,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2284710.0, ans=0.0 2024-08-13 19:38:07,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11100, loss[loss=0.1089, beats_loss=0.009974, ecapa_loss=0.0001804, whisper_loss=0.09714, over 23163.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001616, whisper_loss=0.09202, over 3956526.80 frames. ], batch size: 95, lr: 3.89e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:38:07,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2284810.0, ans=0.125 2024-08-13 19:38:20,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2284810.0, ans=0.125 2024-08-13 19:38:24,380 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 19:38:39,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2285010.0, ans=0.125 2024-08-13 19:38:41,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-13 19:38:46,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2285010.0, ans=0.2 2024-08-13 19:39:02,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2285110.0, ans=0.125 2024-08-13 19:39:20,560 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 19:39:29,448 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11150, loss[loss=0.1065, beats_loss=0.01325, ecapa_loss=0.0001567, whisper_loss=0.09173, over 17456.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001604, whisper_loss=0.0916, over 3943465.55 frames. ], batch size: 71, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:39:33,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2285310.0, ans=0.125 2024-08-13 19:39:52,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.378e+01 2.624e+01 2.890e+01 4.520e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-13 19:40:08,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2285510.0, ans=0.125 2024-08-13 19:40:21,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2285610.0, ans=0.125 2024-08-13 19:40:24,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2285610.0, ans=0.0 2024-08-13 19:40:38,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2285710.0, ans=0.0 2024-08-13 19:40:50,402 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 19:40:52,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11200, loss[loss=0.08953, beats_loss=0.01019, ecapa_loss=0.0001874, whisper_loss=0.07746, over 14419.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001613, whisper_loss=0.09245, over 3901358.68 frames. ], batch size: 59, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:41:42,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2286110.0, ans=0.0 2024-08-13 19:42:09,236 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 19:42:09,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2024-08-13 19:42:13,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11250, loss[loss=0.07651, beats_loss=0.01354, ecapa_loss=0.0001303, whisper_loss=0.06166, over 14249.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001625, whisper_loss=0.09252, over 3889446.36 frames. ], batch size: 57, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:42:26,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2286310.0, ans=0.0 2024-08-13 19:42:34,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.479e+01 2.663e+01 3.017e+01 9.282e+01, threshold=5.327e+01, percent-clipped=2.0 2024-08-13 19:42:40,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2286410.0, ans=0.125 2024-08-13 19:42:44,648 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 19:42:53,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2286510.0, ans=0.0 2024-08-13 19:42:55,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2286510.0, ans=0.0 2024-08-13 19:43:13,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-13 19:43:16,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2286710.0, ans=0.125 2024-08-13 19:43:22,171 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 19:43:25,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2286710.0, ans=0.2 2024-08-13 19:43:27,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.99 vs. limit=22.5 2024-08-13 19:43:31,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11300, loss[loss=0.1069, beats_loss=0.008866, ecapa_loss=0.0001537, whisper_loss=0.09646, over 18251.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.0001615, whisper_loss=0.09235, over 3875323.43 frames. ], batch size: 70, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:43:35,466 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.739e+01 2024-08-13 19:43:40,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2286810.0, ans=0.2 2024-08-13 19:43:50,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-13 19:43:53,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=12.0 2024-08-13 19:43:55,380 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 19:44:01,454 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 19:44:16,047 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 19:44:19,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2287110.0, ans=0.2 2024-08-13 19:44:21,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2287110.0, ans=0.125 2024-08-13 19:44:25,860 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 19:44:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2287110.0, ans=0.0 2024-08-13 19:44:26,998 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 19:44:34,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2024-08-13 19:44:39,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2024-08-13 19:44:53,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11350, loss[loss=0.1061, beats_loss=0.01072, ecapa_loss=0.0001538, whisper_loss=0.0938, over 22717.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001608, whisper_loss=0.09181, over 3863511.68 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:44:55,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2287310.0, ans=0.05 2024-08-13 19:44:59,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2287310.0, ans=0.2 2024-08-13 19:45:06,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2287310.0, ans=10.0 2024-08-13 19:45:15,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.325e+01 2.679e+01 3.004e+01 6.399e+01, threshold=5.358e+01, percent-clipped=2.0 2024-08-13 19:45:32,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2287510.0, ans=0.0 2024-08-13 19:45:40,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2024-08-13 19:46:04,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2287710.0, ans=0.09899494936611666 2024-08-13 19:46:14,165 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11400, loss[loss=0.1105, beats_loss=0.009898, ecapa_loss=0.0001871, whisper_loss=0.09874, over 21326.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01065, ecapa_loss=0.0001611, whisper_loss=0.09268, over 3870854.53 frames. ], batch size: 85, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:46:21,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2287810.0, ans=0.125 2024-08-13 19:46:29,218 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 19:46:35,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.83 vs. limit=10.0 2024-08-13 19:46:38,112 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 19:46:41,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2287910.0, ans=0.1 2024-08-13 19:46:59,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-13 19:47:02,250 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 19:47:06,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2288110.0, ans=0.2 2024-08-13 19:47:16,953 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 19:47:20,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2288210.0, ans=0.125 2024-08-13 19:47:33,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11450, loss[loss=0.09127, beats_loss=0.01048, ecapa_loss=0.0001982, whisper_loss=0.07882, over 21631.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01068, ecapa_loss=0.0001615, whisper_loss=0.09238, over 3873883.89 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:47:44,662 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 19:47:45,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-13 19:47:55,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.447e+01 2.680e+01 2.957e+01 5.322e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-13 19:47:58,030 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 19:48:17,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2288510.0, ans=0.125 2024-08-13 19:48:28,087 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 19:48:35,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2288710.0, ans=0.125 2024-08-13 19:48:51,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11500, loss[loss=0.07913, beats_loss=0.01127, ecapa_loss=0.0001743, whisper_loss=0.06611, over 19096.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001614, whisper_loss=0.09233, over 3876526.16 frames. ], batch size: 77, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:49:15,189 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 19:49:42,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-13 19:49:54,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2289110.0, ans=0.0 2024-08-13 19:49:55,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2289210.0, ans=0.0 2024-08-13 19:49:58,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2289210.0, ans=0.0 2024-08-13 19:50:09,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2289210.0, ans=0.1 2024-08-13 19:50:13,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11550, loss[loss=0.1138, beats_loss=0.01203, ecapa_loss=0.0001225, whisper_loss=0.1005, over 21425.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01054, ecapa_loss=0.0001619, whisper_loss=0.09321, over 3852082.88 frames. ], batch size: 81, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:50:18,299 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 19:50:18,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2289310.0, ans=0.0 2024-08-13 19:50:26,887 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 19:50:36,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.552e+01 2.829e+01 3.234e+01 6.675e+01, threshold=5.658e+01, percent-clipped=2.0 2024-08-13 19:50:59,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2289510.0, ans=0.0 2024-08-13 19:50:59,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-08-13 19:51:05,699 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 19:51:16,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2289610.0, ans=0.1 2024-08-13 19:51:35,569 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11600, loss[loss=0.09305, beats_loss=0.009237, ecapa_loss=0.0001597, whisper_loss=0.08221, over 17791.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01053, ecapa_loss=0.0001622, whisper_loss=0.09318, over 3879588.09 frames. ], batch size: 71, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:51:49,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2289810.0, ans=0.0 2024-08-13 19:51:53,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2024-08-13 19:52:07,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2290010.0, ans=0.125 2024-08-13 19:52:24,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2290110.0, ans=0.1 2024-08-13 19:52:41,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2290210.0, ans=0.125 2024-08-13 19:52:44,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2290210.0, ans=0.035 2024-08-13 19:52:47,958 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 19:52:48,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-13 19:52:49,776 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 19:52:59,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11650, loss[loss=0.1287, beats_loss=0.007812, ecapa_loss=0.0001448, whisper_loss=0.1195, over 23136.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0106, ecapa_loss=0.0001613, whisper_loss=0.09308, over 3909105.91 frames. ], batch size: 85, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:53:14,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2290410.0, ans=0.125 2024-08-13 19:53:22,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.408e+01 2.632e+01 2.967e+01 4.953e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-13 19:53:42,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-08-13 19:53:45,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2290510.0, ans=0.0 2024-08-13 19:53:51,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-13 19:54:00,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2290610.0, ans=0.125 2024-08-13 19:54:03,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2290610.0, ans=0.125 2024-08-13 19:54:15,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-13 19:54:15,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2290710.0, ans=0.125 2024-08-13 19:54:19,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2290710.0, ans=0.125 2024-08-13 19:54:20,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-13 19:54:23,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11700, loss[loss=0.08041, beats_loss=0.01143, ecapa_loss=0.0001599, whisper_loss=0.06738, over 15541.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01065, ecapa_loss=0.0001618, whisper_loss=0.09255, over 3927558.48 frames. ], batch size: 65, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:54:44,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2290910.0, ans=0.0 2024-08-13 19:55:07,089 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 19:55:20,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2291110.0, ans=0.125 2024-08-13 19:55:29,047 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 19:55:42,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.61 vs. limit=22.5 2024-08-13 19:55:44,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=12.0 2024-08-13 19:55:46,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11750, loss[loss=0.1238, beats_loss=0.009465, ecapa_loss=0.0001298, whisper_loss=0.1131, over 18273.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01063, ecapa_loss=0.0001622, whisper_loss=0.09355, over 3942881.74 frames. ], batch size: 69, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:56:07,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2291410.0, ans=0.125 2024-08-13 19:56:11,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.398e+01 2.617e+01 2.949e+01 4.150e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 19:56:13,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2291410.0, ans=0.0 2024-08-13 19:56:57,867 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 19:56:58,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2291710.0, ans=0.125 2024-08-13 19:56:59,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2291710.0, ans=0.95 2024-08-13 19:57:09,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11800, loss[loss=0.1025, beats_loss=0.01184, ecapa_loss=0.0001903, whisper_loss=0.08873, over 22070.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0107, ecapa_loss=0.0001622, whisper_loss=0.09313, over 3940283.43 frames. ], batch size: 92, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:57:30,154 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 19:57:36,201 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 19:57:47,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2292010.0, ans=0.125 2024-08-13 19:58:03,338 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-13 19:58:10,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2292110.0, ans=0.125 2024-08-13 19:58:26,691 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 19:58:32,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11850, loss[loss=0.1115, beats_loss=0.009666, ecapa_loss=0.0001657, whisper_loss=0.1002, over 24435.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.0001616, whisper_loss=0.09283, over 3952844.57 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 19:58:33,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2292310.0, ans=0.0 2024-08-13 19:58:55,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.456e+01 2.721e+01 2.965e+01 7.443e+01, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 19:59:20,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2292610.0, ans=0.125 2024-08-13 19:59:20,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0 2024-08-13 19:59:52,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11900, loss[loss=0.09567, beats_loss=0.01316, ecapa_loss=0.0001257, whisper_loss=0.08125, over 22526.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.0001619, whisper_loss=0.09227, over 3983971.10 frames. ], batch size: 90, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:00:07,907 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-13 20:00:15,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2292910.0, ans=0.0 2024-08-13 20:00:31,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2293010.0, ans=0.2 2024-08-13 20:00:36,419 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 20:01:01,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-13 20:01:13,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 11950, loss[loss=0.09, beats_loss=0.01198, ecapa_loss=0.0001577, whisper_loss=0.07644, over 21109.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001611, whisper_loss=0.09215, over 3936093.32 frames. ], batch size: 89, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:01:17,675 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:01:33,238 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 20:01:35,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.292e+01 2.621e+01 2.963e+01 5.710e+01, threshold=5.241e+01, percent-clipped=1.0 2024-08-13 20:01:44,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2293510.0, ans=0.1 2024-08-13 20:01:48,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-13 20:02:20,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.31 vs. limit=10.0 2024-08-13 20:02:32,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12000, loss[loss=0.1061, beats_loss=0.011, ecapa_loss=0.0001498, whisper_loss=0.09358, over 22941.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0001608, whisper_loss=0.09135, over 3890331.73 frames. ], batch size: 91, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:02:32,109 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 20:03:12,746 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005542, whisper_loss=0.248, over 922467.00 frames. 2024-08-13 20:03:33,706 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on SV_voxceleb1: loss=0.004415, beats_loss=0, ecapa_loss=0.0004415, whisper_loss=0, over 939242.00 frames. 2024-08-13 20:05:21,805 INFO [train_multi_KD3.py:1149] (3/4) Epoch 16, validation on AT_audioset: loss=0.02371, beats_loss=0.02371, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 20:05:21,809 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 20:05:22,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2293810.0, ans=0.125 2024-08-13 20:05:22,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=12.0 2024-08-13 20:05:23,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2293810.0, ans=0.125 2024-08-13 20:05:26,400 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-13 20:05:30,863 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 20:05:43,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2024-08-13 20:05:54,168 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.122e-02 2024-08-13 20:06:09,664 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 20:06:14,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2024-08-13 20:06:15,760 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 20:06:35,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2294210.0, ans=0.0 2024-08-13 20:06:40,923 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 20:06:44,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12050, loss[loss=0.1034, beats_loss=0.009263, ecapa_loss=0.0001125, whisper_loss=0.09297, over 23471.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001613, whisper_loss=0.09144, over 3871326.07 frames. ], batch size: 86, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:06:51,430 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 20:07:01,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-13 20:07:07,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.498e+01 2.752e+01 3.060e+01 1.760e+02, threshold=5.504e+01, percent-clipped=2.0 2024-08-13 20:07:07,989 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 20:07:12,656 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 20:07:26,114 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 20:07:32,162 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 20:07:35,586 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 20:07:41,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-13 20:07:46,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2294610.0, ans=0.125 2024-08-13 20:07:50,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2294710.0, ans=0.125 2024-08-13 20:08:08,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12100, loss[loss=0.1186, beats_loss=0.005589, ecapa_loss=0.000184, whisper_loss=0.1112, over 15574.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001616, whisper_loss=0.09163, over 3877659.67 frames. ], batch size: 58, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:08:12,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-13 20:08:17,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2294810.0, ans=0.0 2024-08-13 20:08:25,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2294910.0, ans=0.125 2024-08-13 20:08:37,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2294910.0, ans=0.1 2024-08-13 20:08:37,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2294910.0, ans=0.125 2024-08-13 20:08:38,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2024-08-13 20:08:48,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2295010.0, ans=0.125 2024-08-13 20:09:12,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-13 20:09:16,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2295210.0, ans=0.125 2024-08-13 20:09:21,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2295210.0, ans=0.125 2024-08-13 20:09:28,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2295310.0, ans=0.07 2024-08-13 20:09:29,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12150, loss[loss=0.08435, beats_loss=0.01315, ecapa_loss=0.0001669, whisper_loss=0.06953, over 20520.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001616, whisper_loss=0.09154, over 3847340.40 frames. ], batch size: 87, lr: 3.88e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:09:50,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2295410.0, ans=0.1 2024-08-13 20:09:52,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.320e+01 2.600e+01 2.810e+01 5.391e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-13 20:09:56,665 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 20:10:10,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2295510.0, ans=10.0 2024-08-13 20:10:46,870 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 20:10:55,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12200, loss[loss=0.1179, beats_loss=0.01044, ecapa_loss=0.0001791, whisper_loss=0.1057, over 13958.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001611, whisper_loss=0.09152, over 3844430.00 frames. ], batch size: 56, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:11:18,458 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 20:12:16,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12250, loss[loss=0.1194, beats_loss=0.008402, ecapa_loss=0.0001651, whisper_loss=0.1093, over 20770.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001614, whisper_loss=0.09096, over 3853820.51 frames. ], batch size: 79, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:12:32,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2296410.0, ans=0.125 2024-08-13 20:12:32,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2296410.0, ans=0.125 2024-08-13 20:12:38,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.422e+01 2.679e+01 2.912e+01 4.424e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-13 20:12:41,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2296410.0, ans=10.0 2024-08-13 20:13:30,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2296710.0, ans=0.07 2024-08-13 20:13:36,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12300, loss[loss=0.1191, beats_loss=0.009303, ecapa_loss=0.0001552, whisper_loss=0.1083, over 22012.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.0001624, whisper_loss=0.09133, over 3889184.53 frames. ], batch size: 84, lr: 3.88e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:13:38,535 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 20:13:51,998 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 20:13:52,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2296910.0, ans=0.125 2024-08-13 20:13:54,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2296910.0, ans=0.0 2024-08-13 20:14:00,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2296910.0, ans=15.0 2024-08-13 20:14:10,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-13 20:14:13,862 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 20:14:14,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2297010.0, ans=0.0 2024-08-13 20:14:18,341 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 20:14:30,169 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 20:14:40,409 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 20:14:43,550 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 20:14:55,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12350, loss[loss=0.09682, beats_loss=0.01119, ecapa_loss=0.0001755, whisper_loss=0.08388, over 21634.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001641, whisper_loss=0.09162, over 3865231.70 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:15:12,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-13 20:15:18,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.416e+01 2.631e+01 3.029e+01 4.449e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-13 20:15:21,090 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 20:15:37,688 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 20:15:42,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2297510.0, ans=0.125 2024-08-13 20:15:49,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-08-13 20:15:50,318 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 20:15:55,438 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 20:16:18,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12400, loss[loss=0.1025, beats_loss=0.01178, ecapa_loss=0.0001376, whisper_loss=0.08939, over 22457.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01061, ecapa_loss=0.0001625, whisper_loss=0.09175, over 3868539.13 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:16:42,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2297910.0, ans=0.125 2024-08-13 20:16:47,137 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 20:16:58,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-08-13 20:17:09,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2298110.0, ans=0.0 2024-08-13 20:17:23,391 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 20:17:36,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12450, loss[loss=0.1051, beats_loss=0.01081, ecapa_loss=0.0001756, whisper_loss=0.09256, over 15489.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001631, whisper_loss=0.09136, over 3825006.88 frames. ], batch size: 62, lr: 3.87e-03, grad_scale: 1.152921504606847e+18 2024-08-13 20:17:57,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.445e+01 2.805e+01 3.307e+01 1.075e+02, threshold=5.611e+01, percent-clipped=1.0 2024-08-13 20:18:01,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2298410.0, ans=0.07 2024-08-13 20:18:02,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2298410.0, ans=0.07 2024-08-13 20:18:28,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2298610.0, ans=0.5 2024-08-13 20:18:32,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-08-13 20:18:34,902 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 20:18:40,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2298710.0, ans=0.125 2024-08-13 20:18:42,676 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 20:18:46,420 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 20:18:51,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2298710.0, ans=0.1 2024-08-13 20:18:53,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12500, loss[loss=0.08904, beats_loss=0.01224, ecapa_loss=0.0001793, whisper_loss=0.07501, over 22621.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.0001631, whisper_loss=0.09122, over 3846075.52 frames. ], batch size: 93, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:19:02,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2298810.0, ans=0.125 2024-08-13 20:19:04,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-13 20:19:06,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2298810.0, ans=0.125 2024-08-13 20:19:06,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2298810.0, ans=0.125 2024-08-13 20:19:13,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2298910.0, ans=0.2 2024-08-13 20:19:18,422 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-13 20:19:19,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-13 20:19:44,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2299110.0, ans=0.125 2024-08-13 20:19:47,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2299110.0, ans=0.2 2024-08-13 20:19:51,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2299110.0, ans=0.125 2024-08-13 20:19:53,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2299110.0, ans=0.1 2024-08-13 20:20:06,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2299210.0, ans=0.2 2024-08-13 20:20:13,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12550, loss[loss=0.08409, beats_loss=0.01189, ecapa_loss=0.000178, whisper_loss=0.07042, over 19993.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001641, whisper_loss=0.09068, over 3866449.90 frames. ], batch size: 85, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:20:34,976 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 20:20:37,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2299410.0, ans=0.1 2024-08-13 20:20:37,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.461e+01 2.791e+01 3.123e+01 5.243e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 20:20:38,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2299410.0, ans=0.5 2024-08-13 20:20:38,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-13 20:21:13,468 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 20:21:26,266 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 20:21:32,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2299810.0, ans=0.2 2024-08-13 20:21:32,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12600, loss[loss=0.1166, beats_loss=0.01113, ecapa_loss=0.0001233, whisper_loss=0.1042, over 15016.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.000165, whisper_loss=0.09225, over 3870808.11 frames. ], batch size: 57, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:22:00,057 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 20:22:05,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2300010.0, ans=0.0 2024-08-13 20:22:19,431 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 20:22:34,832 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 20:22:45,182 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 20:22:50,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12650, loss[loss=0.0665, beats_loss=0.01302, ecapa_loss=0.0001697, whisper_loss=0.05178, over 17998.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001653, whisper_loss=0.09182, over 3904594.20 frames. ], batch size: 75, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:22:58,787 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 20:23:03,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=12.0 2024-08-13 20:23:08,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2300410.0, ans=0.125 2024-08-13 20:23:13,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.379e+01 2.634e+01 2.946e+01 5.512e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-13 20:23:18,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-08-13 20:23:22,797 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 20:23:40,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2300610.0, ans=0.125 2024-08-13 20:24:07,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12700, loss[loss=0.08396, beats_loss=0.01234, ecapa_loss=0.0001661, whisper_loss=0.06996, over 19240.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001641, whisper_loss=0.09128, over 3875386.68 frames. ], batch size: 77, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:24:20,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-08-13 20:25:24,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2301210.0, ans=0.1 2024-08-13 20:25:26,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12750, loss[loss=0.1278, beats_loss=0.008856, ecapa_loss=0.000171, whisper_loss=0.1173, over 18081.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001626, whisper_loss=0.09235, over 3872886.35 frames. ], batch size: 71, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:25:50,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.318e+01 2.587e+01 2.901e+01 2.435e+02, threshold=5.175e+01, percent-clipped=0.0 2024-08-13 20:25:52,883 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.766e+01 2024-08-13 20:25:55,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2301410.0, ans=0.125 2024-08-13 20:25:57,193 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 20:26:03,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2301510.0, ans=0.1 2024-08-13 20:26:14,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-08-13 20:26:22,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2301610.0, ans=0.0 2024-08-13 20:26:45,274 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12800, loss[loss=0.09387, beats_loss=0.01159, ecapa_loss=0.0001818, whisper_loss=0.08046, over 17138.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001634, whisper_loss=0.09212, over 3882928.77 frames. ], batch size: 71, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:26:47,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2301810.0, ans=0.2 2024-08-13 20:27:01,713 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 20:27:18,395 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.583e-01 2024-08-13 20:27:44,504 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 20:27:59,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2302210.0, ans=0.125 2024-08-13 20:28:03,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12850, loss[loss=0.09518, beats_loss=0.01084, ecapa_loss=0.000148, whisper_loss=0.08286, over 22341.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01087, ecapa_loss=0.0001637, whisper_loss=0.09101, over 3859987.07 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:28:13,594 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 20:28:15,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2302310.0, ans=0.2 2024-08-13 20:28:26,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.352e+01 2.567e+01 2.932e+01 5.459e+01, threshold=5.134e+01, percent-clipped=2.0 2024-08-13 20:28:31,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2302410.0, ans=0.125 2024-08-13 20:28:32,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=12.0 2024-08-13 20:28:36,111 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 20:28:51,907 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 20:29:18,632 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 20:29:20,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12900, loss[loss=0.1066, beats_loss=0.01194, ecapa_loss=0.0001974, whisper_loss=0.09273, over 21370.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01094, ecapa_loss=0.0001617, whisper_loss=0.09002, over 3854107.66 frames. ], batch size: 87, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:29:32,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=12.0 2024-08-13 20:29:34,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-08-13 20:29:40,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2302910.0, ans=0.125 2024-08-13 20:29:41,502 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 20:29:41,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2302910.0, ans=0.1 2024-08-13 20:30:04,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2303010.0, ans=0.125 2024-08-13 20:30:09,210 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 20:30:16,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2303110.0, ans=0.125 2024-08-13 20:30:23,382 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 20:30:30,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-13 20:30:31,736 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 20:30:39,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-08-13 20:30:39,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 12950, loss[loss=0.1002, beats_loss=0.01027, ecapa_loss=0.0001639, whisper_loss=0.08827, over 19064.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001618, whisper_loss=0.09037, over 3839268.48 frames. ], batch size: 78, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:31:01,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.283e+01 2.671e+01 2.992e+01 6.489e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 20:31:22,114 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 20:31:26,015 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 20:31:37,883 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 20:31:40,856 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 20:31:45,175 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 20:31:47,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2303710.0, ans=0.125 2024-08-13 20:31:49,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2303710.0, ans=0.125 2024-08-13 20:31:50,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2303710.0, ans=0.0 2024-08-13 20:31:59,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13000, loss[loss=0.1097, beats_loss=0.008601, ecapa_loss=0.0001733, whisper_loss=0.0994, over 22075.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001631, whisper_loss=0.09119, over 3863486.63 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:32:07,876 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 20:32:10,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2303810.0, ans=0.125 2024-08-13 20:32:16,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2303810.0, ans=0.2 2024-08-13 20:32:41,327 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 20:32:45,490 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 20:32:50,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2304010.0, ans=0.125 2024-08-13 20:33:10,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-13 20:33:23,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13050, loss[loss=0.08887, beats_loss=0.008073, ecapa_loss=0.0002361, whisper_loss=0.07843, over 13261.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.000163, whisper_loss=0.09105, over 3846188.27 frames. ], batch size: 57, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:33:27,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2304310.0, ans=0.0 2024-08-13 20:33:45,644 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 20:33:46,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-13 20:33:49,644 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 20:33:55,488 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.312e+01 2.609e+01 2.956e+01 5.975e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-13 20:34:01,446 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 20:34:40,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2304610.0, ans=0.0 2024-08-13 20:34:59,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2024-08-13 20:35:16,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13100, loss[loss=0.1135, beats_loss=0.01196, ecapa_loss=0.0001284, whisper_loss=0.1003, over 23598.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001619, whisper_loss=0.09181, over 3869205.31 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:35:18,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-13 20:35:25,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2304810.0, ans=0.0 2024-08-13 20:35:35,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2304910.0, ans=0.125 2024-08-13 20:35:40,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2304910.0, ans=0.125 2024-08-13 20:35:58,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2305010.0, ans=0.0 2024-08-13 20:36:00,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2305010.0, ans=0.1 2024-08-13 20:36:04,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2305010.0, ans=0.2 2024-08-13 20:36:07,754 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 20:36:35,297 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 20:36:48,854 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 20:37:02,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13150, loss[loss=0.1096, beats_loss=0.007864, ecapa_loss=0.0001571, whisper_loss=0.1001, over 20381.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01073, ecapa_loss=0.0001608, whisper_loss=0.09206, over 3877105.53 frames. ], batch size: 75, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:37:30,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2305410.0, ans=0.125 2024-08-13 20:37:39,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.460e+01 2.677e+01 2.995e+01 4.365e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-13 20:37:45,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2305410.0, ans=0.07 2024-08-13 20:37:53,034 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 20:38:10,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-08-13 20:38:16,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2305610.0, ans=0.0 2024-08-13 20:38:41,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2305710.0, ans=0.2 2024-08-13 20:38:56,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2305710.0, ans=0.125 2024-08-13 20:39:04,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13200, loss[loss=0.06164, beats_loss=0.01073, ecapa_loss=0.0001928, whisper_loss=0.04898, over 12999.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001607, whisper_loss=0.0915, over 3852404.34 frames. ], batch size: 54, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:39:12,220 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 20:40:02,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2306010.0, ans=0.0 2024-08-13 20:40:17,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2306010.0, ans=0.125 2024-08-13 20:40:27,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2306110.0, ans=0.1 2024-08-13 20:40:30,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=2306110.0, ans=12.0 2024-08-13 20:40:55,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-13 20:41:04,791 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 20:41:12,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13250, loss[loss=0.11, beats_loss=0.009555, ecapa_loss=0.0001348, whisper_loss=0.09914, over 18592.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01069, ecapa_loss=0.0001619, whisper_loss=0.09203, over 3834564.98 frames. ], batch size: 69, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:41:48,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.383e+01 2.626e+01 2.998e+01 4.392e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 20:42:27,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=12.0 2024-08-13 20:42:35,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2306610.0, ans=0.2 2024-08-13 20:42:41,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2306710.0, ans=0.2 2024-08-13 20:42:53,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13300, loss[loss=0.1189, beats_loss=0.007647, ecapa_loss=0.0001665, whisper_loss=0.1096, over 16071.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01069, ecapa_loss=0.0001614, whisper_loss=0.09197, over 3835597.73 frames. ], batch size: 60, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:43:01,652 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08281490951776505, model_norm_threshold=52.51310729980469 2024-08-13 20:43:01,876 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.983e+04, grad_sumsq=6.983e+04, orig_rms_sq=1.000e+00 2024-08-13 20:43:03,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2306810.0, ans=0.2 2024-08-13 20:43:13,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2306910.0, ans=0.2 2024-08-13 20:43:15,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2306910.0, ans=0.0 2024-08-13 20:43:17,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2306910.0, ans=0.125 2024-08-13 20:43:46,977 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 20:43:48,836 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 20:44:08,251 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 20:44:13,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13350, loss[loss=0.1104, beats_loss=0.008339, ecapa_loss=0.000141, whisper_loss=0.1007, over 14327.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001613, whisper_loss=0.092, over 3814714.76 frames. ], batch size: 53, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:44:33,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2307410.0, ans=0.125 2024-08-13 20:44:38,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.444e+01 2.749e+01 3.154e+01 6.341e+02, threshold=5.498e+01, percent-clipped=1.0 2024-08-13 20:44:44,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2307410.0, ans=0.2 2024-08-13 20:44:47,568 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 20:44:49,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2307510.0, ans=0.125 2024-08-13 20:45:15,595 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 20:45:23,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2307710.0, ans=0.125 2024-08-13 20:45:26,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2307710.0, ans=0.1 2024-08-13 20:45:28,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2307710.0, ans=0.0 2024-08-13 20:45:34,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13400, loss[loss=0.1069, beats_loss=0.009911, ecapa_loss=0.0001601, whisper_loss=0.09534, over 17501.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001621, whisper_loss=0.09219, over 3836860.42 frames. ], batch size: 70, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:45:49,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-13 20:46:03,906 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 20:46:11,749 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 20:46:14,769 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 30 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 20:46:24,618 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 20:46:25,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2308110.0, ans=0.0 2024-08-13 20:46:37,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2308210.0, ans=0.0 2024-08-13 20:46:46,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2308210.0, ans=0.0 2024-08-13 20:46:54,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13450, loss[loss=0.1094, beats_loss=0.01084, ecapa_loss=0.0001659, whisper_loss=0.09687, over 17001.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001622, whisper_loss=0.09122, over 3853052.89 frames. ], batch size: 66, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:46:59,882 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 20:47:00,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-13 20:47:07,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2308310.0, ans=0.2 2024-08-13 20:47:12,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2308410.0, ans=0.125 2024-08-13 20:47:17,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.358e+01 2.676e+01 3.282e+01 1.336e+02, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 20:47:21,653 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 20:47:55,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2308710.0, ans=0.2 2024-08-13 20:47:57,827 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 20:48:13,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13500, loss[loss=0.08107, beats_loss=0.01092, ecapa_loss=0.0001499, whisper_loss=0.06866, over 14758.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001629, whisper_loss=0.09085, over 3841727.53 frames. ], batch size: 57, lr: 3.87e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:48:25,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2308810.0, ans=0.125 2024-08-13 20:48:29,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-13 20:48:40,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.80 vs. limit=5.0 2024-08-13 20:48:43,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2308910.0, ans=0.0 2024-08-13 20:49:15,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=12.0 2024-08-13 20:49:24,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2309210.0, ans=0.125 2024-08-13 20:49:29,647 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 20:49:36,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13550, loss[loss=0.1048, beats_loss=0.01125, ecapa_loss=0.0001721, whisper_loss=0.0918, over 22382.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001623, whisper_loss=0.09157, over 3871944.07 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:50:00,060 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 20:50:00,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.439e+01 2.648e+01 3.076e+01 1.090e+02, threshold=5.296e+01, percent-clipped=1.0 2024-08-13 20:50:03,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2309410.0, ans=0.1 2024-08-13 20:50:17,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.72 vs. limit=12.0 2024-08-13 20:50:23,218 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 20:50:32,623 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 20:50:53,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2309710.0, ans=0.0 2024-08-13 20:50:56,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13600, loss[loss=0.09552, beats_loss=0.01182, ecapa_loss=0.0001469, whisper_loss=0.08223, over 18060.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.000162, whisper_loss=0.09171, over 3852422.91 frames. ], batch size: 74, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:51:00,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2309810.0, ans=0.0 2024-08-13 20:51:17,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-13 20:51:18,138 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 20:51:18,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2309910.0, ans=0.125 2024-08-13 20:51:27,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2310010.0, ans=0.125 2024-08-13 20:51:32,023 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 20:51:32,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2310010.0, ans=0.1 2024-08-13 20:51:43,024 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 20:51:49,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2310110.0, ans=0.2 2024-08-13 20:52:15,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13650, loss[loss=0.1027, beats_loss=0.009512, ecapa_loss=0.0001597, whisper_loss=0.09159, over 19628.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001617, whisper_loss=0.09192, over 3900769.52 frames. ], batch size: 75, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:52:34,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2310410.0, ans=0.125 2024-08-13 20:52:37,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.402e+01 2.676e+01 3.034e+01 5.771e+01, threshold=5.352e+01, percent-clipped=1.0 2024-08-13 20:52:43,942 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 20:52:48,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2310510.0, ans=0.125 2024-08-13 20:53:07,666 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.932e+01 2024-08-13 20:53:07,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2310610.0, ans=0.0 2024-08-13 20:53:16,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-13 20:53:20,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2310710.0, ans=0.125 2024-08-13 20:53:23,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2310710.0, ans=15.0 2024-08-13 20:53:30,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13700, loss[loss=0.0993, beats_loss=0.01292, ecapa_loss=0.0001235, whisper_loss=0.08514, over 23462.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001615, whisper_loss=0.09182, over 3916394.31 frames. ], batch size: 91, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:53:33,867 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 20:53:38,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2310810.0, ans=0.1 2024-08-13 20:53:47,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2310910.0, ans=0.125 2024-08-13 20:54:00,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2311010.0, ans=0.0 2024-08-13 20:54:11,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2311010.0, ans=0.0 2024-08-13 20:54:17,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-13 20:54:23,752 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 20:54:27,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2311110.0, ans=0.0 2024-08-13 20:54:30,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2311210.0, ans=0.125 2024-08-13 20:54:31,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2311210.0, ans=0.1 2024-08-13 20:54:33,897 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 20:54:39,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2311210.0, ans=0.125 2024-08-13 20:54:41,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2311210.0, ans=0.125 2024-08-13 20:54:43,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13750, loss[loss=0.1181, beats_loss=0.007077, ecapa_loss=0.0001569, whisper_loss=0.1094, over 22198.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001615, whisper_loss=0.09162, over 3928550.24 frames. ], batch size: 83, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:54:44,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2311310.0, ans=0.0 2024-08-13 20:54:50,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2024-08-13 20:54:52,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2311310.0, ans=0.125 2024-08-13 20:54:56,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2311410.0, ans=0.09899494936611666 2024-08-13 20:55:05,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.299e+01 2.662e+01 2.929e+01 4.195e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 20:55:06,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2311410.0, ans=0.125 2024-08-13 20:55:29,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2311610.0, ans=0.0 2024-08-13 20:55:42,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2311710.0, ans=0.0 2024-08-13 20:55:50,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-13 20:55:50,889 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 20:55:53,901 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 20:55:57,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13800, loss[loss=0.09691, beats_loss=0.01013, ecapa_loss=0.0001543, whisper_loss=0.08524, over 22610.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001611, whisper_loss=0.09161, over 3920899.61 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:55:59,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.15 vs. limit=22.5 2024-08-13 20:56:28,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-13 20:56:32,182 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 20:56:54,394 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 20:56:55,539 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 20:57:06,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2312210.0, ans=0.125 2024-08-13 20:57:07,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2024-08-13 20:57:08,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13850, loss[loss=0.1082, beats_loss=0.01194, ecapa_loss=0.0001665, whisper_loss=0.09458, over 22719.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.000161, whisper_loss=0.09163, over 3908029.90 frames. ], batch size: 90, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:57:28,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2312410.0, ans=0.125 2024-08-13 20:57:31,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.385e+01 2.789e+01 3.325e+01 4.881e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 20:57:50,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2312510.0, ans=0.0 2024-08-13 20:58:05,819 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 20:58:21,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13900, loss[loss=0.1025, beats_loss=0.01231, ecapa_loss=0.0001455, whisper_loss=0.08875, over 22262.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01078, ecapa_loss=0.0001606, whisper_loss=0.09234, over 3914214.44 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:58:33,281 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 20:58:46,833 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 20:58:58,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-13 20:59:10,167 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 20:59:34,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 13950, loss[loss=0.07543, beats_loss=0.0138, ecapa_loss=0.0001075, whisper_loss=0.06056, over 21978.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001613, whisper_loss=0.09192, over 3878011.87 frames. ], batch size: 87, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 20:59:34,715 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 20:59:35,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2313310.0, ans=0.125 2024-08-13 20:59:39,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2313310.0, ans=0.07 2024-08-13 20:59:56,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.456e+01 2.773e+01 3.183e+01 5.275e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 21:00:00,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2313410.0, ans=0.0 2024-08-13 21:00:01,658 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 21:00:23,315 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 21:00:48,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14000, loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.09138, over 16358.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001605, whisper_loss=0.09222, over 3881908.97 frames. ], batch size: 66, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:00:52,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2313810.0, ans=0.125 2024-08-13 21:00:52,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2313810.0, ans=0.125 2024-08-13 21:00:59,028 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 21:01:06,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2313910.0, ans=0.0 2024-08-13 21:01:07,325 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 21:01:07,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2313910.0, ans=0.2 2024-08-13 21:01:31,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2314110.0, ans=0.125 2024-08-13 21:01:40,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2314110.0, ans=0.1 2024-08-13 21:01:42,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-08-13 21:01:52,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2314210.0, ans=0.025 2024-08-13 21:01:57,772 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 21:02:02,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14050, loss[loss=0.09062, beats_loss=0.01032, ecapa_loss=0.0002137, whisper_loss=0.07817, over 17929.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001592, whisper_loss=0.09227, over 3869145.66 frames. ], batch size: 76, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:02:03,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2314310.0, ans=0.125 2024-08-13 21:02:07,330 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 21:02:11,237 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 21:02:24,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.387e+01 2.626e+01 2.873e+01 4.572e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-13 21:02:25,874 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 21:02:28,692 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 21:03:06,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2024-08-13 21:03:15,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14100, loss[loss=0.1083, beats_loss=0.009244, ecapa_loss=0.0001388, whisper_loss=0.09762, over 16279.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001597, whisper_loss=0.09172, over 3847817.95 frames. ], batch size: 61, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:03:38,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2314910.0, ans=0.0 2024-08-13 21:03:48,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2315010.0, ans=0.2 2024-08-13 21:04:02,061 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 21:04:06,145 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 21:04:26,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2315310.0, ans=0.125 2024-08-13 21:04:26,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14150, loss[loss=0.1166, beats_loss=0.01117, ecapa_loss=0.0001247, whisper_loss=0.1041, over 22976.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001599, whisper_loss=0.09213, over 3889710.98 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:04:47,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.708e+01 2024-08-13 21:04:49,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.465e+01 2.633e+01 2.994e+01 4.985e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 21:05:00,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2315510.0, ans=0.125 2024-08-13 21:05:05,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-08-13 21:05:24,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2315610.0, ans=0.0 2024-08-13 21:05:24,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2315610.0, ans=0.125 2024-08-13 21:05:41,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14200, loss[loss=0.101, beats_loss=0.01114, ecapa_loss=0.00013, whisper_loss=0.08856, over 23129.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001603, whisper_loss=0.09179, over 3886567.09 frames. ], batch size: 89, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:05:54,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2315810.0, ans=0.125 2024-08-13 21:06:03,474 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:06:25,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2316010.0, ans=0.125 2024-08-13 21:06:25,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2316010.0, ans=0.125 2024-08-13 21:06:39,895 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-13 21:06:41,935 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 21:07:00,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14250, loss[loss=0.112, beats_loss=0.01096, ecapa_loss=0.000136, whisper_loss=0.09967, over 19864.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001578, whisper_loss=0.09198, over 3910322.53 frames. ], batch size: 77, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:07:09,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2316310.0, ans=0.125 2024-08-13 21:07:15,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-13 21:07:22,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2316410.0, ans=0.0 2024-08-13 21:07:24,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.489e+01 2.737e+01 3.188e+01 4.877e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-13 21:07:33,182 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 21:07:52,245 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 21:08:02,460 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 21:08:03,967 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 21:08:17,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14300, loss[loss=0.07755, beats_loss=0.01287, ecapa_loss=0.0001678, whisper_loss=0.06301, over 17271.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001592, whisper_loss=0.0923, over 3886879.08 frames. ], batch size: 70, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:08:17,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-13 21:08:20,198 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 21:08:31,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2316910.0, ans=0.025 2024-08-13 21:08:42,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2316910.0, ans=0.2 2024-08-13 21:08:55,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-13 21:08:57,010 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 21:09:04,629 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-13 21:09:11,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2317110.0, ans=0.125 2024-08-13 21:09:12,543 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 21:09:26,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2024-08-13 21:09:33,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14350, loss[loss=0.1048, beats_loss=0.0107, ecapa_loss=0.0001415, whisper_loss=0.09267, over 20692.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001585, whisper_loss=0.09139, over 3932195.71 frames. ], batch size: 82, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:09:35,254 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:09:36,278 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 21:09:37,906 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 21:09:38,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-13 21:09:43,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2024-08-13 21:09:56,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.382e+01 2.716e+01 3.017e+01 1.009e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:10:04,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2317510.0, ans=0.125 2024-08-13 21:10:14,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2317510.0, ans=0.125 2024-08-13 21:10:15,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2317510.0, ans=0.125 2024-08-13 21:10:20,988 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 21:10:33,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2317710.0, ans=0.125 2024-08-13 21:10:34,536 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 21:10:34,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2317710.0, ans=0.125 2024-08-13 21:10:37,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-13 21:10:39,840 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 21:10:49,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14400, loss[loss=0.1111, beats_loss=0.01173, ecapa_loss=0.0001266, whisper_loss=0.09807, over 22580.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001614, whisper_loss=0.09136, over 3944617.48 frames. ], batch size: 86, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:10:51,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2317810.0, ans=0.025 2024-08-13 21:11:01,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=15.0 2024-08-13 21:11:09,957 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 21:11:10,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:12,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:15,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2317910.0, ans=0.0 2024-08-13 21:11:18,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2318010.0, ans=0.125 2024-08-13 21:11:22,970 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:11:25,977 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 21:11:35,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2318110.0, ans=0.125 2024-08-13 21:11:37,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2318110.0, ans=0.2 2024-08-13 21:11:39,795 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 21:12:05,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 16, batch 14450, loss[loss=0.1025, beats_loss=0.01086, ecapa_loss=0.0001499, whisper_loss=0.09011, over 17677.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.000161, whisper_loss=0.09153, over 3940459.97 frames. ], batch size: 68, lr: 3.86e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:12:09,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2318310.0, ans=0.125 2024-08-13 21:12:16,472 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 21:12:22,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2318410.0, ans=0.125 2024-08-13 21:12:28,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.376e+01 2.680e+01 3.028e+01 6.046e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-13 21:12:31,009 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 21:12:31,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2318410.0, ans=0.1 2024-08-13 21:12:40,886 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 21:12:42,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2318510.0, ans=0.125 2024-08-13 21:12:44,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2318510.0, ans=0.125 2024-08-13 21:12:49,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2318610.0, ans=0.0 2024-08-13 21:13:02,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2318710.0, ans=10.0 2024-08-13 21:13:47,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 0, loss[loss=0.1136, beats_loss=0.005559, ecapa_loss=0.0002224, whisper_loss=0.1058, over 17946.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.005559, ecapa_loss=0.0002224, whisper_loss=0.1058, over 17946.00 frames. ], batch size: 72, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:13:47,661 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 21:14:29,951 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005592, whisper_loss=0.2478, over 922467.00 frames. 2024-08-13 21:14:46,252 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on SV_voxceleb1: loss=0.004509, beats_loss=0, ecapa_loss=0.0004509, whisper_loss=0, over 939242.00 frames. 2024-08-13 21:16:46,379 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on AT_audioset: loss=0.02361, beats_loss=0.02361, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 21:16:46,386 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 21:16:57,131 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 21:16:58,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:01,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2318730.0, ans=0.0 2024-08-13 21:17:04,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2318730.0, ans=0.05 2024-08-13 21:17:04,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:04,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=2318730.0, ans=22.5 2024-08-13 21:17:08,429 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 21:17:10,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2318730.0, ans=0.125 2024-08-13 21:17:28,735 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 21:17:30,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2318830.0, ans=0.2 2024-08-13 21:17:46,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2318930.0, ans=0.1 2024-08-13 21:17:50,303 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-13 21:18:55,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2024-08-13 21:18:58,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 50, loss[loss=0.09367, beats_loss=0.008924, ecapa_loss=0.0001511, whisper_loss=0.08324, over 16805.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.00983, ecapa_loss=0.0001627, whisper_loss=0.09197, over 887643.28 frames. ], batch size: 67, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:19:08,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2024-08-13 21:19:12,823 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 21:19:25,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2319330.0, ans=0.125 2024-08-13 21:19:42,876 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 21:19:43,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=15.0 2024-08-13 21:19:51,994 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 21:19:54,555 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 21:19:55,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.701e+01 3.109e+01 3.430e+01 6.788e+01, threshold=6.217e+01, percent-clipped=2.0 2024-08-13 21:19:57,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2319430.0, ans=0.125 2024-08-13 21:20:01,852 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 21:20:02,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2024-08-13 21:20:13,526 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 21:20:15,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2319530.0, ans=0.0 2024-08-13 21:20:59,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-13 21:21:00,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 100, loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001276, whisper_loss=0.09293, over 23294.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009834, ecapa_loss=0.0001635, whisper_loss=0.09092, over 1536931.01 frames. ], batch size: 89, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:21:04,922 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 13 from Vox, 54 fro AS 2024-08-13 21:21:25,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2319830.0, ans=0.04949747468305833 2024-08-13 21:21:28,835 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 21:21:32,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-13 21:21:36,256 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 21:21:51,433 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 21:21:56,099 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 21:22:39,026 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 21:22:40,791 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 21:22:52,633 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 150, loss[loss=0.09908, beats_loss=0.009629, ecapa_loss=0.0001609, whisper_loss=0.08784, over 22271.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009952, ecapa_loss=0.000162, whisper_loss=0.09038, over 2046471.64 frames. ], batch size: 87, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:23:06,472 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 21:23:27,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.64 vs. limit=15.0 2024-08-13 21:23:32,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.591e+01 2.910e+01 3.226e+01 4.259e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-13 21:23:35,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2320430.0, ans=0.0 2024-08-13 21:23:38,024 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 32 from Vox, 24 fro AS 2024-08-13 21:23:38,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2320430.0, ans=0.0 2024-08-13 21:23:44,664 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 21:23:48,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2320530.0, ans=0.125 2024-08-13 21:24:07,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2320630.0, ans=0.125 2024-08-13 21:24:16,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 200, loss[loss=0.09155, beats_loss=0.009527, ecapa_loss=0.0002024, whisper_loss=0.08, over 17605.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009969, ecapa_loss=0.0001645, whisper_loss=0.09078, over 2435398.55 frames. ], batch size: 73, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:24:16,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2320730.0, ans=0.125 2024-08-13 21:24:32,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2320830.0, ans=0.125 2024-08-13 21:24:35,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2320830.0, ans=0.0 2024-08-13 21:24:37,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2320830.0, ans=0.1 2024-08-13 21:25:13,597 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 21:25:21,123 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.333e+05 2024-08-13 21:25:23,415 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-13 21:25:28,788 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 21:25:38,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 250, loss[loss=0.09973, beats_loss=0.01235, ecapa_loss=0.0001411, whisper_loss=0.08596, over 17612.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01018, ecapa_loss=0.0001643, whisper_loss=0.0912, over 2720795.63 frames. ], batch size: 68, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:26:02,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2321330.0, ans=0.0 2024-08-13 21:26:09,237 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 21:26:15,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2321430.0, ans=0.125 2024-08-13 21:26:15,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2321430.0, ans=0.125 2024-08-13 21:26:17,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.338e+01 2.625e+01 3.056e+01 3.496e+02, threshold=5.250e+01, percent-clipped=1.0 2024-08-13 21:26:25,032 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 21:27:02,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 300, loss[loss=0.1077, beats_loss=0.008992, ecapa_loss=0.0002048, whisper_loss=0.09668, over 22925.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001633, whisper_loss=0.0907, over 2961267.52 frames. ], batch size: 93, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:27:29,453 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 21:27:35,388 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 21:27:39,799 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 21:28:29,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 350, loss[loss=0.09478, beats_loss=0.01124, ecapa_loss=0.0001538, whisper_loss=0.082, over 13843.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001626, whisper_loss=0.08989, over 3146485.05 frames. ], batch size: 55, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:28:36,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-13 21:28:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2322330.0, ans=0.2 2024-08-13 21:28:55,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-13 21:29:02,187 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:29:08,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.409e+01 2.739e+01 3.112e+01 5.763e+01, threshold=5.479e+01, percent-clipped=3.0 2024-08-13 21:29:08,570 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 21:29:41,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2322630.0, ans=0.125 2024-08-13 21:29:55,075 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-13 21:29:57,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 400, loss[loss=0.1018, beats_loss=0.01176, ecapa_loss=0.0001504, whisper_loss=0.08853, over 22379.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001623, whisper_loss=0.09094, over 3305613.24 frames. ], batch size: 92, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:30:21,129 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 21:30:24,309 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 21:30:40,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-13 21:30:44,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2322930.0, ans=0.125 2024-08-13 21:30:46,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2322930.0, ans=0.125 2024-08-13 21:30:49,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2323030.0, ans=0.0 2024-08-13 21:30:49,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2323030.0, ans=0.125 2024-08-13 21:30:55,591 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-13 21:30:57,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2323030.0, ans=0.0 2024-08-13 21:31:08,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2323130.0, ans=0.125 2024-08-13 21:31:25,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 450, loss[loss=0.09367, beats_loss=0.008882, ecapa_loss=0.0001577, whisper_loss=0.08321, over 15996.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001608, whisper_loss=0.09011, over 3407641.34 frames. ], batch size: 59, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:31:34,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2323230.0, ans=0.125 2024-08-13 21:31:43,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2323330.0, ans=0.1 2024-08-13 21:31:49,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2323330.0, ans=0.0 2024-08-13 21:32:06,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.385e+01 2.600e+01 2.999e+01 5.733e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-13 21:32:10,647 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.942e+01 2024-08-13 21:32:27,231 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 21:32:27,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2323530.0, ans=0.1 2024-08-13 21:32:27,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2323530.0, ans=0.125 2024-08-13 21:32:27,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2323530.0, ans=0.04949747468305833 2024-08-13 21:32:41,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.13 vs. limit=5.0 2024-08-13 21:32:44,659 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 21:32:52,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 500, loss[loss=0.1219, beats_loss=0.006039, ecapa_loss=0.0002273, whisper_loss=0.1136, over 15752.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001603, whisper_loss=0.08992, over 3503452.82 frames. ], batch size: 61, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:32:56,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2323730.0, ans=0.1 2024-08-13 21:33:15,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2323830.0, ans=0.1 2024-08-13 21:33:16,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2323830.0, ans=0.0 2024-08-13 21:33:16,820 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.754e+05 2024-08-13 21:33:32,347 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 21:33:41,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2024-08-13 21:34:08,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-13 21:34:13,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 550, loss[loss=0.1147, beats_loss=0.0093, ecapa_loss=0.0001582, whisper_loss=0.1039, over 16129.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001597, whisper_loss=0.09002, over 3584460.03 frames. ], batch size: 64, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:34:18,961 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 21:34:32,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2024-08-13 21:34:32,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-08-13 21:34:50,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2324430.0, ans=0.125 2024-08-13 21:34:51,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.278e+01 2.510e+01 2.744e+01 4.092e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-13 21:35:00,841 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 21:35:02,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2324530.0, ans=0.0 2024-08-13 21:35:10,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2324530.0, ans=0.0 2024-08-13 21:35:31,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 600, loss[loss=0.1139, beats_loss=0.01111, ecapa_loss=0.0001651, whisper_loss=0.1012, over 18400.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001584, whisper_loss=0.08984, over 3618135.67 frames. ], batch size: 72, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:35:37,922 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 21:35:40,695 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 21:35:42,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2324730.0, ans=0.035 2024-08-13 21:35:49,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2324830.0, ans=0.125 2024-08-13 21:36:04,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2324930.0, ans=0.125 2024-08-13 21:36:27,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2325130.0, ans=0.125 2024-08-13 21:36:30,891 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 21:36:36,096 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 21:36:37,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2325230.0, ans=0.125 2024-08-13 21:36:38,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 650, loss[loss=0.08914, beats_loss=0.01058, ecapa_loss=0.0001479, whisper_loss=0.07708, over 16548.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001574, whisper_loss=0.08993, over 3628863.04 frames. ], batch size: 63, lr: 3.74e-03, grad_scale: 1.152921504606847e+18 2024-08-13 21:36:45,454 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-13 21:36:58,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2325330.0, ans=0.0 2024-08-13 21:37:09,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.389e+01 2.704e+01 3.013e+01 8.978e+01, threshold=5.408e+01, percent-clipped=2.0 2024-08-13 21:37:19,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2325530.0, ans=0.1 2024-08-13 21:37:23,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2325530.0, ans=0.0 2024-08-13 21:37:37,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-13 21:37:43,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 700, loss[loss=0.08474, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.07277, over 17320.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001571, whisper_loss=0.09042, over 3664366.05 frames. ], batch size: 67, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:38:04,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-13 21:38:04,839 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 21:38:06,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2024-08-13 21:38:13,845 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 21:38:27,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-13 21:38:27,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=8.0 2024-08-13 21:38:43,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2326130.0, ans=0.125 2024-08-13 21:38:48,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 750, loss[loss=0.09126, beats_loss=0.01181, ecapa_loss=0.0001421, whisper_loss=0.07803, over 14236.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001586, whisper_loss=0.08968, over 3709485.77 frames. ], batch size: 55, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:38:54,234 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 21:38:58,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2326230.0, ans=0.125 2024-08-13 21:39:19,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.349e+01 2.525e+01 2.805e+01 4.000e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-13 21:39:32,203 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 21:39:32,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2326530.0, ans=0.09899494936611666 2024-08-13 21:39:36,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2326530.0, ans=0.125 2024-08-13 21:39:52,824 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 21:39:54,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 800, loss[loss=0.1225, beats_loss=0.008824, ecapa_loss=0.0002306, whisper_loss=0.1114, over 19331.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001582, whisper_loss=0.08952, over 3758435.20 frames. ], batch size: 80, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:40:02,053 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 21:40:03,221 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 21:40:11,461 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 21:40:11,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2326830.0, ans=0.125 2024-08-13 21:40:21,965 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 21:40:32,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-13 21:40:41,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2327030.0, ans=0.125 2024-08-13 21:40:42,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2327030.0, ans=0.1 2024-08-13 21:40:46,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2327130.0, ans=0.125 2024-08-13 21:40:47,994 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.158e+00 2024-08-13 21:40:55,227 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 21:40:59,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 850, loss[loss=0.09108, beats_loss=0.01111, ecapa_loss=0.0001604, whisper_loss=0.07836, over 21830.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.000159, whisper_loss=0.08948, over 3785774.51 frames. ], batch size: 88, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:41:01,664 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 21:41:03,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2024-08-13 21:41:05,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2327230.0, ans=0.0 2024-08-13 21:41:08,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-13 21:41:11,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-13 21:41:22,616 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 21:41:30,588 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.387e+01 2.596e+01 3.123e+01 5.757e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-13 21:41:44,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2327530.0, ans=0.1 2024-08-13 21:42:02,140 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 21:42:04,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 900, loss[loss=0.1037, beats_loss=0.007449, ecapa_loss=0.0001761, whisper_loss=0.09446, over 15578.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.0001584, whisper_loss=0.08901, over 3786619.30 frames. ], batch size: 58, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:42:14,138 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 21:42:31,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-13 21:42:49,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-13 21:42:51,814 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.581e-02 2024-08-13 21:42:53,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2328030.0, ans=0.125 2024-08-13 21:43:09,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 950, loss[loss=0.1206, beats_loss=0.008899, ecapa_loss=0.0001522, whisper_loss=0.1101, over 21382.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001584, whisper_loss=0.0895, over 3815318.42 frames. ], batch size: 81, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:43:12,578 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 21:43:17,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2328230.0, ans=0.125 2024-08-13 21:43:19,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2328230.0, ans=0.2 2024-08-13 21:43:25,137 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 21:43:30,683 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 21:43:37,413 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 21:43:38,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-13 21:43:39,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2328430.0, ans=0.1 2024-08-13 21:43:41,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.367e+01 2.632e+01 3.016e+01 5.732e+01, threshold=5.263e+01, percent-clipped=3.0 2024-08-13 21:44:04,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2328630.0, ans=0.125 2024-08-13 21:44:15,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1000, loss[loss=0.1152, beats_loss=0.01038, ecapa_loss=0.0001678, whisper_loss=0.1032, over 22408.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.0001577, whisper_loss=0.0899, over 3833718.31 frames. ], batch size: 90, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:44:24,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2328730.0, ans=0.2 2024-08-13 21:44:30,326 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 21:44:40,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2328930.0, ans=0.0 2024-08-13 21:44:43,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2328930.0, ans=0.125 2024-08-13 21:44:56,209 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 21:44:56,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2329030.0, ans=0.125 2024-08-13 21:45:14,347 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.170e+01 2024-08-13 21:45:15,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2329130.0, ans=0.0 2024-08-13 21:45:18,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2329130.0, ans=0.125 2024-08-13 21:45:20,995 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1050, loss[loss=0.1162, beats_loss=0.009856, ecapa_loss=0.0002079, whisper_loss=0.1043, over 21928.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001579, whisper_loss=0.08984, over 3834693.12 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:45:23,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2329230.0, ans=0.125 2024-08-13 21:45:27,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2329230.0, ans=0.1 2024-08-13 21:45:27,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2329230.0, ans=0.1 2024-08-13 21:45:30,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2329230.0, ans=0.125 2024-08-13 21:45:36,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2329330.0, ans=0.125 2024-08-13 21:45:51,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-08-13 21:45:52,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.420e+01 2.664e+01 2.978e+01 4.899e+01, threshold=5.328e+01, percent-clipped=0.0 2024-08-13 21:46:21,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2329630.0, ans=0.0 2024-08-13 21:46:26,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1100, loss[loss=0.1132, beats_loss=0.01043, ecapa_loss=0.0001482, whisper_loss=0.1013, over 21004.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001601, whisper_loss=0.08963, over 3813327.15 frames. ], batch size: 82, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:46:41,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2329830.0, ans=0.125 2024-08-13 21:46:44,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-13 21:47:09,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2330030.0, ans=0.07 2024-08-13 21:47:21,965 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 21:47:27,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2330130.0, ans=0.0 2024-08-13 21:47:31,321 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 21:47:31,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2330230.0, ans=0.0 2024-08-13 21:47:32,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1150, loss[loss=0.1105, beats_loss=0.009914, ecapa_loss=0.0001373, whisper_loss=0.09924, over 19579.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001595, whisper_loss=0.08949, over 3810833.02 frames. ], batch size: 75, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:47:36,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2330230.0, ans=0.125 2024-08-13 21:47:42,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2330230.0, ans=0.125 2024-08-13 21:47:48,412 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 21:47:54,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2330330.0, ans=0.0 2024-08-13 21:48:03,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.427e+01 2.716e+01 3.117e+01 1.034e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-13 21:48:04,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2330430.0, ans=0.125 2024-08-13 21:48:08,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-13 21:48:10,742 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.113e-02 2024-08-13 21:48:37,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1200, loss[loss=0.1173, beats_loss=0.0077, ecapa_loss=0.0001945, whisper_loss=0.1077, over 13901.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001578, whisper_loss=0.08973, over 3793004.31 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:48:48,705 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 21:49:05,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2330930.0, ans=0.125 2024-08-13 21:49:07,273 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 21:49:07,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2330930.0, ans=0.5 2024-08-13 21:49:09,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2330930.0, ans=0.0 2024-08-13 21:49:14,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=12.0 2024-08-13 21:49:14,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2330930.0, ans=0.125 2024-08-13 21:49:17,266 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 21:49:26,486 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 21:49:33,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2331130.0, ans=0.1 2024-08-13 21:49:36,604 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 21:49:40,531 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 21:49:43,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1250, loss[loss=0.09437, beats_loss=0.0135, ecapa_loss=0.0001361, whisper_loss=0.07951, over 22743.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01078, ecapa_loss=0.0001558, whisper_loss=0.0895, over 3793231.75 frames. ], batch size: 92, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:49:52,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2331230.0, ans=0.1 2024-08-13 21:50:11,932 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 21:50:14,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.232e+01 2.491e+01 2.766e+01 6.956e+01, threshold=4.983e+01, percent-clipped=1.0 2024-08-13 21:50:18,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2331430.0, ans=0.0 2024-08-13 21:50:27,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2331530.0, ans=0.2 2024-08-13 21:50:31,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2331530.0, ans=0.125 2024-08-13 21:50:43,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2331630.0, ans=0.125 2024-08-13 21:50:48,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1300, loss[loss=0.09762, beats_loss=0.009976, ecapa_loss=0.000165, whisper_loss=0.08599, over 23371.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01072, ecapa_loss=0.0001561, whisper_loss=0.08997, over 3832975.97 frames. ], batch size: 94, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:51:03,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2331830.0, ans=0.2 2024-08-13 21:51:08,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-13 21:51:41,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2332130.0, ans=0.2 2024-08-13 21:51:42,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-13 21:51:56,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1350, loss[loss=0.1059, beats_loss=0.008403, ecapa_loss=0.0001814, whisper_loss=0.09565, over 18731.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01076, ecapa_loss=0.0001557, whisper_loss=0.08966, over 3840645.15 frames. ], batch size: 74, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:51:58,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2332230.0, ans=0.125 2024-08-13 21:52:02,546 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 21:52:13,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2332330.0, ans=0.125 2024-08-13 21:52:14,184 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 21:52:30,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.343e+01 2.685e+01 2.934e+01 4.089e+01, threshold=5.369e+01, percent-clipped=0.0 2024-08-13 21:52:31,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2332430.0, ans=0.07 2024-08-13 21:52:41,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2332530.0, ans=0.125 2024-08-13 21:52:46,154 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 21:53:02,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-13 21:53:03,048 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 21:53:05,922 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 21:53:10,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1400, loss[loss=0.118, beats_loss=0.007911, ecapa_loss=0.0001416, whisper_loss=0.1087, over 21913.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001551, whisper_loss=0.08978, over 3856616.21 frames. ], batch size: 80, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:53:21,367 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 21:53:25,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2332830.0, ans=0.1 2024-08-13 21:53:29,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2332830.0, ans=0.125 2024-08-13 21:53:35,773 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 21:53:36,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2332830.0, ans=0.125 2024-08-13 21:53:38,479 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 21:53:47,860 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 21:53:50,832 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 21:53:51,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2332930.0, ans=0.1 2024-08-13 21:53:51,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=12.0 2024-08-13 21:53:52,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2333030.0, ans=0.125 2024-08-13 21:53:56,721 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 21:54:02,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2333030.0, ans=0.125 2024-08-13 21:54:24,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1450, loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001486, whisper_loss=0.08915, over 15079.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001546, whisper_loss=0.08963, over 3833408.93 frames. ], batch size: 59, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:54:55,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-13 21:54:56,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2024-08-13 21:55:01,757 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 21:55:19,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2333430.0, ans=0.5 2024-08-13 21:55:21,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.338e+01 2.604e+01 2.874e+01 4.710e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-13 21:55:57,096 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 21:56:01,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1500, loss[loss=0.09508, beats_loss=0.008354, ecapa_loss=0.00021, whisper_loss=0.08463, over 14165.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001548, whisper_loss=0.0899, over 3840641.93 frames. ], batch size: 59, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:56:36,646 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 21:56:46,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2334030.0, ans=0.1 2024-08-13 21:56:46,884 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 21:56:51,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-13 21:57:13,788 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-13 21:57:14,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1550, loss[loss=0.09809, beats_loss=0.01323, ecapa_loss=0.0001201, whisper_loss=0.08366, over 19679.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001554, whisper_loss=0.08984, over 3801539.67 frames. ], batch size: 79, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:57:45,936 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 21:57:48,938 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 21:57:51,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.314e+01 2.571e+01 2.868e+01 3.932e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-13 21:57:54,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2334430.0, ans=0.0 2024-08-13 21:58:00,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2334530.0, ans=0.125 2024-08-13 21:58:12,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2334530.0, ans=0.125 2024-08-13 21:58:22,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2334630.0, ans=0.125 2024-08-13 21:58:29,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1600, loss[loss=0.09944, beats_loss=0.01073, ecapa_loss=0.0001612, whisper_loss=0.0871, over 22635.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001547, whisper_loss=0.08977, over 3827858.56 frames. ], batch size: 93, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:58:30,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2334730.0, ans=0.09899494936611666 2024-08-13 21:58:41,627 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 21:58:47,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2334830.0, ans=0.125 2024-08-13 21:58:56,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-13 21:58:58,634 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 21:59:06,414 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 21:59:07,648 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 21:59:11,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2334930.0, ans=0.125 2024-08-13 21:59:15,694 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-13 21:59:19,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2335030.0, ans=0.1 2024-08-13 21:59:25,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2335030.0, ans=0.025 2024-08-13 21:59:37,515 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 21:59:41,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1650, loss[loss=0.1397, beats_loss=0.009483, ecapa_loss=0.0001538, whisper_loss=0.1287, over 23507.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001545, whisper_loss=0.09081, over 3827458.58 frames. ], batch size: 92, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 21:59:46,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2335230.0, ans=0.125 2024-08-13 22:00:05,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2335330.0, ans=0.015 2024-08-13 22:00:15,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.351e+01 2.606e+01 2.894e+01 4.343e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-13 22:00:22,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-13 22:00:26,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2335530.0, ans=0.2 2024-08-13 22:00:52,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1700, loss[loss=0.1117, beats_loss=0.01141, ecapa_loss=0.0001268, whisper_loss=0.09906, over 23652.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001546, whisper_loss=0.09067, over 3831340.88 frames. ], batch size: 88, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:01:01,343 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-13 22:01:09,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2335830.0, ans=0.125 2024-08-13 22:01:29,087 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 22:01:44,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2336030.0, ans=0.125 2024-08-13 22:01:47,488 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 22:01:52,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2336130.0, ans=0.0 2024-08-13 22:01:54,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2336130.0, ans=0.125 2024-08-13 22:02:02,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1750, loss[loss=0.1102, beats_loss=0.009005, ecapa_loss=0.0001962, whisper_loss=0.09925, over 21252.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001552, whisper_loss=0.09042, over 3817308.76 frames. ], batch size: 85, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:02:02,751 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 22:02:03,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-13 22:02:06,805 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 22:02:10,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2336230.0, ans=0.0 2024-08-13 22:02:15,652 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 22:02:25,585 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-13 22:02:35,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.335e+01 2.606e+01 3.098e+01 1.901e+02, threshold=5.212e+01, percent-clipped=3.0 2024-08-13 22:02:35,293 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 22:02:37,282 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 22:02:48,135 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 22:02:50,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2336530.0, ans=0.0 2024-08-13 22:02:51,000 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-13 22:02:58,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2336630.0, ans=0.125 2024-08-13 22:03:04,367 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 22:03:05,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2024-08-13 22:03:11,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1800, loss[loss=0.1105, beats_loss=0.009383, ecapa_loss=0.0001548, whisper_loss=0.09956, over 17309.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001566, whisper_loss=0.0905, over 3808461.74 frames. ], batch size: 65, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:03:14,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2336730.0, ans=0.05 2024-08-13 22:03:16,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2336730.0, ans=0.125 2024-08-13 22:03:24,052 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 22:03:24,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2336830.0, ans=0.125 2024-08-13 22:03:25,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2336830.0, ans=0.125 2024-08-13 22:03:34,892 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 22:03:41,042 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-13 22:03:45,254 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 22:04:02,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2024-08-13 22:04:03,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2337030.0, ans=0.125 2024-08-13 22:04:20,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1850, loss[loss=0.0723, beats_loss=0.01231, ecapa_loss=0.0001985, whisper_loss=0.058, over 14040.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001556, whisper_loss=0.09014, over 3795068.06 frames. ], batch size: 60, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:04:33,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2337330.0, ans=0.125 2024-08-13 22:04:33,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2337330.0, ans=0.1 2024-08-13 22:04:34,710 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 22:04:42,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2337330.0, ans=0.2 2024-08-13 22:04:48,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2337430.0, ans=0.125 2024-08-13 22:04:52,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.324e+01 2.518e+01 2.718e+01 4.142e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-13 22:05:14,164 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 22:05:29,694 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:05:29,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2337730.0, ans=15.0 2024-08-13 22:05:30,319 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1900, loss[loss=0.08659, beats_loss=0.0104, ecapa_loss=0.0001553, whisper_loss=0.07465, over 19794.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001559, whisper_loss=0.09, over 3804020.18 frames. ], batch size: 83, lr: 3.73e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:05:51,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=12.0 2024-08-13 22:05:59,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2024-08-13 22:06:06,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2337930.0, ans=0.0 2024-08-13 22:06:20,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2338030.0, ans=0.125 2024-08-13 22:06:33,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2338030.0, ans=0.1 2024-08-13 22:06:52,657 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 1950, loss[loss=0.1058, beats_loss=0.01224, ecapa_loss=0.0001369, whisper_loss=0.0922, over 22263.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001562, whisper_loss=0.09006, over 3806750.33 frames. ], batch size: 85, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:07:16,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2338330.0, ans=0.0 2024-08-13 22:07:16,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=12.0 2024-08-13 22:07:20,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2338330.0, ans=0.125 2024-08-13 22:07:21,660 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 22:07:26,410 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 22:07:30,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.359e+01 2.594e+01 2.893e+01 6.920e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-13 22:07:41,071 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 22:08:13,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2000, loss[loss=0.1022, beats_loss=0.009258, ecapa_loss=0.0002139, whisper_loss=0.0908, over 20519.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001571, whisper_loss=0.09013, over 3816060.13 frames. ], batch size: 86, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:08:15,185 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 5 from Vox, 29 fro AS 2024-08-13 22:08:20,526 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.629e-03 2024-08-13 22:08:23,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2338730.0, ans=0.125 2024-08-13 22:08:31,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2338830.0, ans=0.0 2024-08-13 22:08:36,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2338830.0, ans=0.1 2024-08-13 22:08:54,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2338930.0, ans=0.2 2024-08-13 22:08:59,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2338930.0, ans=0.125 2024-08-13 22:09:03,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2339030.0, ans=0.125 2024-08-13 22:09:09,157 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 22:09:12,215 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 22:09:30,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2339130.0, ans=0.0 2024-08-13 22:09:35,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2050, loss[loss=0.101, beats_loss=0.009354, ecapa_loss=0.0001373, whisper_loss=0.09027, over 20208.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001568, whisper_loss=0.08973, over 3795702.12 frames. ], batch size: 75, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:09:39,758 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 22:09:50,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2339330.0, ans=0.125 2024-08-13 22:10:10,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2339430.0, ans=0.05 2024-08-13 22:10:13,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.646e+01 3.086e+01 1.043e+02, threshold=5.292e+01, percent-clipped=1.0 2024-08-13 22:10:13,293 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 22:10:23,122 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 22:10:31,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2339530.0, ans=0.025 2024-08-13 22:10:57,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2100, loss[loss=0.1214, beats_loss=0.01018, ecapa_loss=0.0001521, whisper_loss=0.1097, over 17359.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0108, ecapa_loss=0.000155, whisper_loss=0.08911, over 3783929.69 frames. ], batch size: 65, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:11:13,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2339830.0, ans=0.0 2024-08-13 22:11:25,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2339830.0, ans=0.125 2024-08-13 22:11:40,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2339930.0, ans=0.09899494936611666 2024-08-13 22:12:14,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2150, loss[loss=0.1259, beats_loss=0.009386, ecapa_loss=0.0001496, whisper_loss=0.115, over 23066.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01075, ecapa_loss=0.0001552, whisper_loss=0.08943, over 3790602.12 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:12:18,886 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-13 22:12:19,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2340230.0, ans=0.0 2024-08-13 22:12:26,110 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 22:12:26,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2340230.0, ans=0.125 2024-08-13 22:12:54,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.324e+01 2.581e+01 2.963e+01 1.302e+02, threshold=5.163e+01, percent-clipped=1.0 2024-08-13 22:12:55,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2340430.0, ans=0.5 2024-08-13 22:12:59,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-08-13 22:13:19,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2340630.0, ans=0.125 2024-08-13 22:13:20,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2340630.0, ans=0.2 2024-08-13 22:13:33,916 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 22:13:36,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2200, loss[loss=0.1123, beats_loss=0.00801, ecapa_loss=0.0001741, whisper_loss=0.1025, over 21801.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001545, whisper_loss=0.09024, over 3798628.00 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:13:41,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-13 22:13:42,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2340730.0, ans=0.125 2024-08-13 22:14:00,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2340830.0, ans=0.0 2024-08-13 22:14:06,904 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 22:14:16,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2340930.0, ans=0.125 2024-08-13 22:14:31,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2341030.0, ans=0.125 2024-08-13 22:14:37,529 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 22:14:37,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2341030.0, ans=0.05 2024-08-13 22:14:42,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2341130.0, ans=0.1 2024-08-13 22:14:57,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2250, loss[loss=0.1061, beats_loss=0.01242, ecapa_loss=0.0001475, whisper_loss=0.09222, over 23140.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001548, whisper_loss=0.09087, over 3836041.81 frames. ], batch size: 93, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:15:03,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2341230.0, ans=0.0 2024-08-13 22:15:06,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2341230.0, ans=0.1 2024-08-13 22:15:11,849 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 22:15:35,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.396e+01 2.660e+01 2.938e+01 1.173e+02, threshold=5.320e+01, percent-clipped=2.0 2024-08-13 22:15:39,549 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 22:15:41,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2024-08-13 22:15:44,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2024-08-13 22:16:09,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2341630.0, ans=0.0 2024-08-13 22:16:10,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2024-08-13 22:16:18,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2300, loss[loss=0.08877, beats_loss=0.01116, ecapa_loss=0.0001474, whisper_loss=0.07614, over 13063.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001562, whisper_loss=0.09125, over 3859917.80 frames. ], batch size: 53, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:16:40,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2341830.0, ans=0.125 2024-08-13 22:16:45,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-13 22:16:55,640 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 22:16:58,983 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-13 22:17:02,476 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 22:17:21,661 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 22:17:38,273 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 22:17:39,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2350, loss[loss=0.1063, beats_loss=0.01035, ecapa_loss=0.0001668, whisper_loss=0.09425, over 19888.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01075, ecapa_loss=0.0001579, whisper_loss=0.09112, over 3811775.84 frames. ], batch size: 80, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:17:43,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2342230.0, ans=0.0 2024-08-13 22:17:47,046 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.652e+01 2024-08-13 22:17:50,926 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 22:18:07,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2342330.0, ans=0.125 2024-08-13 22:18:09,942 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 22:18:17,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-13 22:18:19,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2342430.0, ans=0.125 2024-08-13 22:18:19,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.389e+01 2.636e+01 2.881e+01 1.786e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 22:18:37,704 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 35 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 22:18:58,671 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 22:19:01,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2400, loss[loss=0.09823, beats_loss=0.01163, ecapa_loss=0.0001788, whisper_loss=0.08481, over 17967.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01069, ecapa_loss=0.0001588, whisper_loss=0.09172, over 3822990.10 frames. ], batch size: 72, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:19:01,482 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 22:19:04,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2342730.0, ans=0.2 2024-08-13 22:19:12,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-13 22:19:28,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2342830.0, ans=0.0 2024-08-13 22:19:36,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2342930.0, ans=0.0 2024-08-13 22:19:39,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2342930.0, ans=0.125 2024-08-13 22:20:11,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2343130.0, ans=0.0 2024-08-13 22:20:24,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2450, loss[loss=0.07614, beats_loss=0.01385, ecapa_loss=0.0001677, whisper_loss=0.06061, over 18813.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001577, whisper_loss=0.09096, over 3831006.82 frames. ], batch size: 80, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:20:28,732 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 22:20:32,217 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 22:20:32,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2343230.0, ans=0.2 2024-08-13 22:20:43,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2343330.0, ans=0.125 2024-08-13 22:20:46,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2343330.0, ans=0.2 2024-08-13 22:20:58,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2343430.0, ans=0.0 2024-08-13 22:21:05,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.291e+01 2.587e+01 2.997e+01 1.554e+02, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:21:15,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2343530.0, ans=0.125 2024-08-13 22:21:35,810 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 22:21:45,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=15.0 2024-08-13 22:21:47,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2500, loss[loss=0.1145, beats_loss=0.01025, ecapa_loss=0.0001655, whisper_loss=0.1026, over 22115.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001584, whisper_loss=0.09119, over 3836140.97 frames. ], batch size: 87, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:22:20,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2343930.0, ans=0.0 2024-08-13 22:22:31,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2343930.0, ans=0.2 2024-08-13 22:22:31,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2343930.0, ans=0.125 2024-08-13 22:22:32,594 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 22:23:04,199 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-13 22:23:12,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2550, loss[loss=0.1052, beats_loss=0.008707, ecapa_loss=0.0001873, whisper_loss=0.09458, over 16494.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001572, whisper_loss=0.09107, over 3834805.29 frames. ], batch size: 65, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:23:23,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2344230.0, ans=0.2 2024-08-13 22:23:50,734 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 22:23:51,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2344430.0, ans=0.125 2024-08-13 22:23:53,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.329e+01 2.677e+01 3.229e+01 5.510e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-13 22:23:55,661 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 16 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 22:23:57,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2344430.0, ans=0.0 2024-08-13 22:23:59,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-13 22:24:14,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-08-13 22:24:15,221 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 22:24:17,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-13 22:24:35,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2600, loss[loss=0.09242, beats_loss=0.01189, ecapa_loss=0.0001224, whisper_loss=0.07931, over 18840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01081, ecapa_loss=0.000157, whisper_loss=0.09035, over 3827123.05 frames. ], batch size: 71, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:24:36,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2344730.0, ans=0.0 2024-08-13 22:24:36,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2344730.0, ans=0.04949747468305833 2024-08-13 22:24:58,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2344830.0, ans=0.125 2024-08-13 22:25:09,667 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 22:25:15,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2024-08-13 22:25:20,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2344930.0, ans=0.125 2024-08-13 22:25:30,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2345030.0, ans=0.2 2024-08-13 22:25:31,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2024-08-13 22:25:33,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2345030.0, ans=0.125 2024-08-13 22:25:36,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2345030.0, ans=0.125 2024-08-13 22:25:36,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=12.0 2024-08-13 22:25:54,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2650, loss[loss=0.08054, beats_loss=0.01141, ecapa_loss=0.0001491, whisper_loss=0.06763, over 18077.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.000158, whisper_loss=0.09072, over 3846505.33 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 5.764607523034235e+17 2024-08-13 22:26:03,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2345230.0, ans=0.125 2024-08-13 22:26:17,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2345330.0, ans=0.125 2024-08-13 22:26:30,853 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 22:26:31,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.316e+01 2.514e+01 2.879e+01 4.241e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-13 22:26:34,229 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 22:26:47,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2345530.0, ans=0.125 2024-08-13 22:26:55,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2345530.0, ans=0.0 2024-08-13 22:26:55,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2345530.0, ans=0.5 2024-08-13 22:26:58,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2345630.0, ans=0.2 2024-08-13 22:27:04,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2024-08-13 22:27:04,827 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 22:27:12,479 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 22:27:13,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2700, loss[loss=0.09532, beats_loss=0.01212, ecapa_loss=0.0001065, whisper_loss=0.08214, over 22202.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001576, whisper_loss=0.09133, over 3852184.08 frames. ], batch size: 84, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:27:14,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2345730.0, ans=0.125 2024-08-13 22:27:15,884 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.884e+00 2024-08-13 22:27:49,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2345930.0, ans=0.015 2024-08-13 22:28:01,902 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 22:28:26,964 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 22:28:32,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2750, loss[loss=0.1139, beats_loss=0.009576, ecapa_loss=0.0001393, whisper_loss=0.103, over 23489.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001577, whisper_loss=0.09115, over 3834320.98 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:28:49,533 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 22:29:11,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.410e+01 2.665e+01 3.029e+01 5.908e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 22:29:13,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2346430.0, ans=0.2 2024-08-13 22:29:18,179 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 18 from LS+wenet, 38 from Vox, 38 fro AS 2024-08-13 22:29:21,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2346530.0, ans=0.125 2024-08-13 22:29:32,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-08-13 22:29:36,242 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 22:29:46,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2346630.0, ans=0.0 2024-08-13 22:29:50,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2800, loss[loss=0.1288, beats_loss=0.008539, ecapa_loss=0.0002195, whisper_loss=0.1181, over 22831.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001579, whisper_loss=0.09142, over 3849956.97 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:29:54,225 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 22:29:56,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-13 22:30:01,171 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:30:08,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2346830.0, ans=0.125 2024-08-13 22:30:11,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2346830.0, ans=0.0 2024-08-13 22:30:23,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2346930.0, ans=0.025 2024-08-13 22:30:56,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2347130.0, ans=0.125 2024-08-13 22:31:05,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2347130.0, ans=0.125 2024-08-13 22:31:07,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-13 22:31:15,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2850, loss[loss=0.106, beats_loss=0.01085, ecapa_loss=0.0001529, whisper_loss=0.09362, over 20371.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001564, whisper_loss=0.09106, over 3829866.36 frames. ], batch size: 78, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:31:20,069 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 22:31:31,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2347330.0, ans=0.125 2024-08-13 22:31:39,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2347330.0, ans=0.125 2024-08-13 22:31:39,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-13 22:31:47,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2347430.0, ans=0.125 2024-08-13 22:31:52,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.368e+01 2.681e+01 3.083e+01 7.841e+01, threshold=5.363e+01, percent-clipped=3.0 2024-08-13 22:32:23,114 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 22:32:27,604 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 22:32:29,672 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 22:32:30,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2347630.0, ans=0.125 2024-08-13 22:32:32,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2024-08-13 22:32:43,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2900, loss[loss=0.0749, beats_loss=0.01287, ecapa_loss=0.0001543, whisper_loss=0.06049, over 18900.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001582, whisper_loss=0.09093, over 3828007.01 frames. ], batch size: 76, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:32:53,029 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 22:32:55,155 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 22:33:09,717 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 22:33:30,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2347930.0, ans=0.5 2024-08-13 22:33:30,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-13 22:34:31,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 2950, loss[loss=0.1038, beats_loss=0.0115, ecapa_loss=0.0001341, whisper_loss=0.09099, over 19112.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.0001583, whisper_loss=0.09055, over 3845528.13 frames. ], batch size: 75, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:34:49,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2348230.0, ans=0.2 2024-08-13 22:34:54,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2348330.0, ans=0.125 2024-08-13 22:35:13,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2348330.0, ans=0.1 2024-08-13 22:35:14,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2024-08-13 22:35:16,651 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 22:35:29,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.427e+01 2.649e+01 3.118e+01 1.077e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-13 22:35:30,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2348430.0, ans=0.04949747468305833 2024-08-13 22:35:47,860 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 22:35:58,579 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 22:36:27,928 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 22:36:29,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2348630.0, ans=0.2 2024-08-13 22:36:36,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2348730.0, ans=0.0 2024-08-13 22:36:37,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3000, loss[loss=0.1134, beats_loss=0.009364, ecapa_loss=0.0001757, whisper_loss=0.1023, over 22968.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001587, whisper_loss=0.09075, over 3874434.36 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:36:37,574 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 22:37:40,770 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005533, whisper_loss=0.2471, over 922467.00 frames. 2024-08-13 22:38:04,776 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-13 22:41:12,753 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 22:41:12,757 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 22:41:16,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2348730.0, ans=0.0 2024-08-13 22:41:22,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2348730.0, ans=0.2 2024-08-13 22:41:25,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2348730.0, ans=0.0 2024-08-13 22:41:40,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2348830.0, ans=0.125 2024-08-13 22:41:48,038 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 22:42:11,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2349030.0, ans=0.0 2024-08-13 22:42:32,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2349130.0, ans=0.1 2024-08-13 22:42:41,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3050, loss[loss=0.1173, beats_loss=0.01135, ecapa_loss=0.0001446, whisper_loss=0.1045, over 21814.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001585, whisper_loss=0.09057, over 3911318.84 frames. ], batch size: 85, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:42:51,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2349230.0, ans=0.0 2024-08-13 22:42:55,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2349230.0, ans=0.2 2024-08-13 22:43:02,497 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-13 22:43:22,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2349430.0, ans=0.125 2024-08-13 22:43:24,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2349430.0, ans=0.125 2024-08-13 22:43:25,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.432e+01 2.716e+01 3.181e+01 1.148e+02, threshold=5.433e+01, percent-clipped=2.0 2024-08-13 22:43:38,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-13 22:43:46,797 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:44:00,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2349630.0, ans=0.2 2024-08-13 22:44:11,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2349730.0, ans=0.0 2024-08-13 22:44:11,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3100, loss[loss=0.104, beats_loss=0.01147, ecapa_loss=0.0001806, whisper_loss=0.09074, over 21509.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001587, whisper_loss=0.09084, over 3885771.74 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:44:16,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2349730.0, ans=0.125 2024-08-13 22:44:39,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2349830.0, ans=0.0 2024-08-13 22:44:59,168 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 22:45:10,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-13 22:45:19,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2350130.0, ans=0.1 2024-08-13 22:45:37,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3150, loss[loss=0.07804, beats_loss=0.01395, ecapa_loss=0.0001699, whisper_loss=0.06239, over 15390.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01087, ecapa_loss=0.0001581, whisper_loss=0.08992, over 3873092.59 frames. ], batch size: 66, lr: 3.72e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:45:45,134 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 22:45:45,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2350230.0, ans=0.0 2024-08-13 22:45:50,539 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 22:45:52,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2350230.0, ans=0.125 2024-08-13 22:46:20,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.358e+01 2.601e+01 2.838e+01 4.154e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 22:46:30,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2350530.0, ans=0.0 2024-08-13 22:46:49,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2350630.0, ans=0.0 2024-08-13 22:46:55,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2350630.0, ans=0.125 2024-08-13 22:47:01,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2350630.0, ans=0.2 2024-08-13 22:47:03,335 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:47:07,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3200, loss[loss=0.09226, beats_loss=0.01362, ecapa_loss=0.0001237, whisper_loss=0.07741, over 16028.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001586, whisper_loss=0.09059, over 3882272.00 frames. ], batch size: 65, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:47:14,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2350730.0, ans=0.1 2024-08-13 22:47:31,219 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 22:47:33,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-13 22:47:42,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2350830.0, ans=0.125 2024-08-13 22:47:43,627 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 22:48:13,031 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 22:48:35,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-13 22:48:37,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3250, loss[loss=0.1226, beats_loss=0.009314, ecapa_loss=0.0001631, whisper_loss=0.1117, over 19516.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001606, whisper_loss=0.09165, over 3899462.34 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:48:40,894 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 22:49:16,629 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 22:49:18,316 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 22:49:19,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.385e+01 2.597e+01 2.999e+01 7.217e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 22:50:05,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3300, loss[loss=0.1187, beats_loss=0.01006, ecapa_loss=0.0001636, whisper_loss=0.107, over 22269.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001608, whisper_loss=0.0916, over 3925855.82 frames. ], batch size: 84, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:51:09,950 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 22:51:09,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2352030.0, ans=0.05 2024-08-13 22:51:30,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3350, loss[loss=0.1007, beats_loss=0.01167, ecapa_loss=0.0001398, whisper_loss=0.08759, over 18653.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001603, whisper_loss=0.09166, over 3882828.31 frames. ], batch size: 75, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:51:32,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352230.0, ans=0.1 2024-08-13 22:52:00,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2352330.0, ans=0.0 2024-08-13 22:52:08,388 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 22:52:11,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.332e+01 2.587e+01 3.048e+01 7.749e+01, threshold=5.173e+01, percent-clipped=3.0 2024-08-13 22:52:23,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2352530.0, ans=0.1 2024-08-13 22:52:29,829 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 22:52:35,460 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 22:52:36,954 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 22:52:56,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3400, loss[loss=0.103, beats_loss=0.01276, ecapa_loss=0.0001619, whisper_loss=0.08859, over 21935.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01072, ecapa_loss=0.0001591, whisper_loss=0.09103, over 3848991.80 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:53:10,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2352730.0, ans=0.0 2024-08-13 22:53:43,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2352930.0, ans=0.1 2024-08-13 22:53:43,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2024-08-13 22:53:51,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2353030.0, ans=0.125 2024-08-13 22:54:00,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2353030.0, ans=0.0 2024-08-13 22:54:02,215 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 22:54:02,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2353030.0, ans=0.125 2024-08-13 22:54:22,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2024-08-13 22:54:23,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2353130.0, ans=0.1 2024-08-13 22:54:26,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3450, loss[loss=0.0932, beats_loss=0.01252, ecapa_loss=0.0001157, whisper_loss=0.07952, over 20315.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001604, whisper_loss=0.0919, over 3866891.02 frames. ], batch size: 79, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:54:43,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-13 22:54:54,388 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 22:55:00,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2353330.0, ans=0.0 2024-08-13 22:55:09,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.393e+01 2.606e+01 2.901e+01 5.659e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-13 22:55:18,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-13 22:55:30,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2353530.0, ans=0.125 2024-08-13 22:55:31,309 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 22:55:33,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2353530.0, ans=0.035 2024-08-13 22:55:45,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2353630.0, ans=0.125 2024-08-13 22:55:52,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3500, loss[loss=0.0876, beats_loss=0.01164, ecapa_loss=0.0001724, whisper_loss=0.07424, over 19695.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001612, whisper_loss=0.09157, over 3854577.70 frames. ], batch size: 84, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:56:01,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2353730.0, ans=0.125 2024-08-13 22:56:12,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2353830.0, ans=0.2 2024-08-13 22:56:24,313 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 22:56:30,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-13 22:56:37,078 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-13 22:56:40,439 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 22:56:50,862 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 22:57:15,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3550, loss[loss=0.1063, beats_loss=0.00752, ecapa_loss=0.0001936, whisper_loss=0.09687, over 17341.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.000162, whisper_loss=0.09103, over 3852716.72 frames. ], batch size: 69, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:57:35,534 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-13 22:57:47,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2354430.0, ans=0.0 2024-08-13 22:57:47,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-13 22:57:54,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2354430.0, ans=0.2 2024-08-13 22:57:54,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.375e+01 2.617e+01 2.958e+01 4.205e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 22:58:24,933 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 22:58:36,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3600, loss[loss=0.1096, beats_loss=0.01012, ecapa_loss=0.0001729, whisper_loss=0.09777, over 19084.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001619, whisper_loss=0.09138, over 3885040.18 frames. ], batch size: 78, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 22:58:54,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2354830.0, ans=0.0 2024-08-13 22:58:59,737 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 22:59:17,024 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 22:59:34,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2355030.0, ans=0.0 2024-08-13 22:59:42,417 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 22:59:56,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3650, loss[loss=0.1102, beats_loss=0.0115, ecapa_loss=0.0001643, whisper_loss=0.09701, over 18990.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01058, ecapa_loss=0.000162, whisper_loss=0.09206, over 3866755.20 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:00:28,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2355430.0, ans=0.0 2024-08-13 23:00:30,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2355430.0, ans=0.2 2024-08-13 23:00:34,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.445e+01 2.700e+01 3.239e+01 5.632e+01, threshold=5.401e+01, percent-clipped=1.0 2024-08-13 23:00:44,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2355530.0, ans=0.0 2024-08-13 23:00:52,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2355530.0, ans=0.0 2024-08-13 23:01:02,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2355630.0, ans=0.1 2024-08-13 23:01:15,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3700, loss[loss=0.09693, beats_loss=0.01043, ecapa_loss=0.0001839, whisper_loss=0.08466, over 20857.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001624, whisper_loss=0.09161, over 3858102.32 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:01:20,477 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 23:01:22,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2355730.0, ans=0.125 2024-08-13 23:01:31,618 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 23:01:50,182 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 23:01:54,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2355930.0, ans=0.125 2024-08-13 23:02:16,026 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 23:02:16,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2356030.0, ans=0.0 2024-08-13 23:02:34,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3750, loss[loss=0.103, beats_loss=0.01109, ecapa_loss=0.0001636, whisper_loss=0.09028, over 23017.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001629, whisper_loss=0.09178, over 3889477.62 frames. ], batch size: 92, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:02:38,918 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-13 23:02:49,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2356330.0, ans=0.0 2024-08-13 23:02:57,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2356330.0, ans=0.125 2024-08-13 23:03:10,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.349e+01 2.622e+01 2.917e+01 8.940e+01, threshold=5.244e+01, percent-clipped=1.0 2024-08-13 23:03:20,493 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 23:03:20,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2356530.0, ans=0.0 2024-08-13 23:03:49,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3800, loss[loss=0.08829, beats_loss=0.0136, ecapa_loss=0.0001371, whisper_loss=0.07331, over 18081.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.0001636, whisper_loss=0.09152, over 3921025.44 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:04:15,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2356830.0, ans=0.1 2024-08-13 23:04:17,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2356830.0, ans=0.0 2024-08-13 23:04:19,161 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 23:04:28,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2024-08-13 23:04:32,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2024-08-13 23:04:52,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2357130.0, ans=0.125 2024-08-13 23:04:55,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2357130.0, ans=0.0 2024-08-13 23:04:57,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2357130.0, ans=0.125 2024-08-13 23:05:07,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3850, loss[loss=0.1263, beats_loss=0.008849, ecapa_loss=0.0001732, whisper_loss=0.1158, over 22165.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01069, ecapa_loss=0.0001628, whisper_loss=0.09221, over 3915337.23 frames. ], batch size: 87, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:05:07,433 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 23:05:41,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2357430.0, ans=0.125 2024-08-13 23:05:44,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.323e+01 2.537e+01 2.804e+01 4.147e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-13 23:05:57,659 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 23:06:12,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2357630.0, ans=0.0 2024-08-13 23:06:23,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3900, loss[loss=0.111, beats_loss=0.01122, ecapa_loss=0.0001478, whisper_loss=0.09831, over 18032.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01072, ecapa_loss=0.0001622, whisper_loss=0.09253, over 3910004.23 frames. ], batch size: 71, lr: 3.71e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:06:31,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2357730.0, ans=0.1 2024-08-13 23:06:41,906 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 23:06:45,265 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 23:06:48,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2357830.0, ans=0.035 2024-08-13 23:06:49,910 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-13 23:06:50,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2357830.0, ans=0.125 2024-08-13 23:06:53,425 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.389e-02 2024-08-13 23:06:56,061 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 23:06:56,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2357930.0, ans=0.125 2024-08-13 23:06:58,902 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 23:07:13,688 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 9 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 23:07:15,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2358030.0, ans=0.125 2024-08-13 23:07:41,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 3950, loss[loss=0.09975, beats_loss=0.01143, ecapa_loss=0.0001415, whisper_loss=0.08691, over 20419.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01063, ecapa_loss=0.0001635, whisper_loss=0.09327, over 3915517.52 frames. ], batch size: 79, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:07:42,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2358230.0, ans=0.0 2024-08-13 23:07:53,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2358230.0, ans=0.05 2024-08-13 23:08:20,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.511e+01 2.750e+01 3.070e+01 4.670e+01, threshold=5.499e+01, percent-clipped=0.0 2024-08-13 23:08:48,983 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 23:08:57,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4000, loss[loss=0.09849, beats_loss=0.0116, ecapa_loss=0.0001438, whisper_loss=0.08545, over 18510.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01061, ecapa_loss=0.0001636, whisper_loss=0.09345, over 3936961.06 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:09:21,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2358830.0, ans=0.125 2024-08-13 23:10:06,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2359130.0, ans=0.5 2024-08-13 23:10:15,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4050, loss[loss=0.09753, beats_loss=0.01038, ecapa_loss=0.0001567, whisper_loss=0.08557, over 19488.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01055, ecapa_loss=0.0001631, whisper_loss=0.0932, over 3915172.02 frames. ], batch size: 76, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:10:15,312 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 23:10:15,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:18,591 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:10:20,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2359230.0, ans=0.0 2024-08-13 23:10:22,646 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 23:10:22,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:25,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:25,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2359230.0, ans=0.125 2024-08-13 23:10:29,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-13 23:10:34,864 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:10:37,550 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 23:10:39,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2359330.0, ans=0.07 2024-08-13 23:10:44,368 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 23:10:44,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2359430.0, ans=0.125 2024-08-13 23:10:51,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.408e+01 2.659e+01 2.975e+01 6.287e+01, threshold=5.318e+01, percent-clipped=1.0 2024-08-13 23:10:54,976 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 23:10:57,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2359430.0, ans=0.1 2024-08-13 23:10:59,640 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 23:11:10,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2024-08-13 23:11:23,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2359630.0, ans=0.125 2024-08-13 23:11:24,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2359630.0, ans=0.125 2024-08-13 23:11:29,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4100, loss[loss=0.09879, beats_loss=0.01284, ecapa_loss=0.0001824, whisper_loss=0.08413, over 17897.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01063, ecapa_loss=0.0001633, whisper_loss=0.09294, over 3896341.65 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:11:35,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2359730.0, ans=0.2 2024-08-13 23:11:36,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2359730.0, ans=0.0 2024-08-13 23:11:40,522 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 23:11:54,527 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 23:11:58,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2024-08-13 23:12:00,908 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 23:12:17,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2360030.0, ans=10.0 2024-08-13 23:12:22,716 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 23:12:44,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2360130.0, ans=0.0 2024-08-13 23:12:45,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2360130.0, ans=0.04949747468305833 2024-08-13 23:12:48,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4150, loss[loss=0.1286, beats_loss=0.00869, ecapa_loss=0.0001682, whisper_loss=0.1182, over 20445.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01067, ecapa_loss=0.0001619, whisper_loss=0.09302, over 3911411.48 frames. ], batch size: 79, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:12:51,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=12.0 2024-08-13 23:12:53,474 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 23:13:00,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2360230.0, ans=0.125 2024-08-13 23:13:05,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2360330.0, ans=0.1 2024-08-13 23:13:08,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2360330.0, ans=0.125 2024-08-13 23:13:20,657 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.014e+01 2024-08-13 23:13:22,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2360430.0, ans=0.125 2024-08-13 23:13:25,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.420e+01 2.616e+01 2.987e+01 7.044e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 23:13:29,408 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 23:13:39,579 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 23:13:55,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2360630.0, ans=0.1 2024-08-13 23:14:02,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4200, loss[loss=0.09491, beats_loss=0.01222, ecapa_loss=0.0001401, whisper_loss=0.08129, over 22396.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001612, whisper_loss=0.09246, over 3878654.78 frames. ], batch size: 91, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:14:33,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2360930.0, ans=0.2 2024-08-13 23:14:39,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-13 23:14:39,544 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 23:14:40,817 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 23:14:41,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2360930.0, ans=0.0 2024-08-13 23:14:41,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2360930.0, ans=0.125 2024-08-13 23:14:43,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2360930.0, ans=0.2 2024-08-13 23:14:59,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-08-13 23:15:05,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2361130.0, ans=0.125 2024-08-13 23:15:09,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2024-08-13 23:15:11,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2361230.0, ans=0.0 2024-08-13 23:15:12,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4250, loss[loss=0.0912, beats_loss=0.01025, ecapa_loss=0.0001446, whisper_loss=0.0795, over 14297.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001609, whisper_loss=0.09176, over 3857213.41 frames. ], batch size: 54, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:15:24,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2361330.0, ans=0.125 2024-08-13 23:15:44,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.294e+01 2.587e+01 2.870e+01 6.296e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 23:15:50,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2361530.0, ans=0.2 2024-08-13 23:15:57,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2361530.0, ans=0.2 2024-08-13 23:15:59,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-13 23:16:13,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2361630.0, ans=0.1 2024-08-13 23:16:14,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-08-13 23:16:17,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4300, loss[loss=0.116, beats_loss=0.009371, ecapa_loss=0.0001782, whisper_loss=0.1048, over 19007.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01071, ecapa_loss=0.0001617, whisper_loss=0.09223, over 3869262.97 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:16:24,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2361730.0, ans=0.1 2024-08-13 23:16:28,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2361730.0, ans=0.07 2024-08-13 23:17:49,733 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 23:17:53,450 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 23:18:05,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2362130.0, ans=0.0 2024-08-13 23:18:13,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4350, loss[loss=0.0919, beats_loss=0.01093, ecapa_loss=0.0001457, whisper_loss=0.07951, over 16518.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01063, ecapa_loss=0.0001621, whisper_loss=0.09265, over 3851084.38 frames. ], batch size: 65, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:18:13,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2362230.0, ans=0.0 2024-08-13 23:18:24,141 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 23:18:47,466 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 23:18:48,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2362430.0, ans=0.0 2024-08-13 23:18:49,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2362430.0, ans=0.125 2024-08-13 23:18:52,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.337e+01 2.576e+01 3.012e+01 4.056e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-13 23:19:08,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2362530.0, ans=0.1 2024-08-13 23:19:11,681 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 23:19:14,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2362530.0, ans=0.125 2024-08-13 23:19:15,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2362530.0, ans=0.1 2024-08-13 23:19:29,990 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 23:19:33,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4400, loss[loss=0.1124, beats_loss=0.01205, ecapa_loss=0.0001198, whisper_loss=0.09917, over 15706.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001614, whisper_loss=0.092, over 3869560.40 frames. ], batch size: 60, lr: 3.71e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:19:51,599 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 23:20:00,929 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 23:20:02,455 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 23:20:03,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2362830.0, ans=0.2 2024-08-13 23:20:03,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2362830.0, ans=0.125 2024-08-13 23:20:11,605 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 23:20:19,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2363030.0, ans=0.1 2024-08-13 23:20:23,826 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-13 23:20:26,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2363030.0, ans=0.0 2024-08-13 23:20:27,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2363030.0, ans=0.015 2024-08-13 23:20:31,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2363030.0, ans=0.125 2024-08-13 23:20:33,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2363030.0, ans=0.035 2024-08-13 23:20:35,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2363130.0, ans=0.2 2024-08-13 23:20:48,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4450, loss[loss=0.07061, beats_loss=0.01256, ecapa_loss=0.0001813, whisper_loss=0.05623, over 12781.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001614, whisper_loss=0.09217, over 3859589.86 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:20:49,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2363230.0, ans=0.0 2024-08-13 23:20:52,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2363230.0, ans=0.0 2024-08-13 23:21:04,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-13 23:21:05,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-13 23:21:10,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=12.0 2024-08-13 23:21:28,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.409e+01 2.664e+01 2.942e+01 4.100e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 23:21:39,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2363530.0, ans=0.0 2024-08-13 23:21:52,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2363630.0, ans=0.125 2024-08-13 23:21:56,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2363630.0, ans=0.125 2024-08-13 23:22:09,831 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4500, loss[loss=0.1012, beats_loss=0.01246, ecapa_loss=0.0001544, whisper_loss=0.08716, over 22514.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01072, ecapa_loss=0.0001602, whisper_loss=0.09238, over 3920883.63 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:22:16,269 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 23:22:33,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2363830.0, ans=0.0 2024-08-13 23:22:49,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2363930.0, ans=0.0 2024-08-13 23:23:06,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2364030.0, ans=0.1 2024-08-13 23:23:24,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4550, loss[loss=0.1056, beats_loss=0.01106, ecapa_loss=0.0001523, whisper_loss=0.09299, over 20647.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001611, whisper_loss=0.09179, over 3939706.20 frames. ], batch size: 81, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:23:25,055 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:23:29,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2364230.0, ans=0.125 2024-08-13 23:23:33,303 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 23:23:49,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2364330.0, ans=0.125 2024-08-13 23:24:00,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.368e+01 2.686e+01 2.952e+01 5.692e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-13 23:24:00,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2364430.0, ans=0.125 2024-08-13 23:24:10,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2364530.0, ans=0.125 2024-08-13 23:24:18,539 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 23:24:19,665 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 23:24:23,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2364630.0, ans=0.125 2024-08-13 23:24:24,852 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 23:24:29,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-13 23:24:33,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4600, loss[loss=0.09574, beats_loss=0.007904, ecapa_loss=0.0001425, whisper_loss=0.08641, over 15588.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09194, over 3932151.24 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:24:48,831 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:24:56,280 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-13 23:25:00,228 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 23:25:25,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2024-08-13 23:25:26,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=12.0 2024-08-13 23:25:39,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2365130.0, ans=0.0 2024-08-13 23:25:41,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4650, loss[loss=0.1071, beats_loss=0.01087, ecapa_loss=0.0001781, whisper_loss=0.09449, over 16239.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001604, whisper_loss=0.09193, over 3922971.94 frames. ], batch size: 65, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:25:43,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2365230.0, ans=0.125 2024-08-13 23:26:09,753 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 23:26:12,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2365430.0, ans=0.125 2024-08-13 23:26:15,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.449e+01 2.734e+01 2.969e+01 1.115e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-13 23:26:33,458 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 23:26:41,515 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 23:26:47,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4700, loss[loss=0.07381, beats_loss=0.01294, ecapa_loss=0.0001387, whisper_loss=0.05948, over 16400.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01078, ecapa_loss=0.0001589, whisper_loss=0.0913, over 3912408.43 frames. ], batch size: 67, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:09,575 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 23:27:16,713 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 23:27:18,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2365930.0, ans=0.1 2024-08-13 23:27:27,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2366030.0, ans=0.125 2024-08-13 23:27:39,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2366130.0, ans=0.125 2024-08-13 23:27:41,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2366130.0, ans=0.1 2024-08-13 23:27:52,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4750, loss[loss=0.1096, beats_loss=0.01063, ecapa_loss=0.0001554, whisper_loss=0.09744, over 18229.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.09166, over 3895876.27 frames. ], batch size: 74, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:27:53,506 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.913e+01 2024-08-13 23:28:08,627 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 23:28:20,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2366430.0, ans=0.125 2024-08-13 23:28:25,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.418e+01 2.670e+01 2.931e+01 4.166e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-13 23:28:25,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2366430.0, ans=0.2 2024-08-13 23:28:29,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2366430.0, ans=0.035 2024-08-13 23:28:34,350 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 23:28:57,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4800, loss[loss=0.08611, beats_loss=0.01141, ecapa_loss=0.0001812, whisper_loss=0.07289, over 16179.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001592, whisper_loss=0.09097, over 3891730.99 frames. ], batch size: 66, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:28:59,577 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 23:29:04,900 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:29:06,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2366730.0, ans=0.0 2024-08-13 23:29:09,781 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:29:13,877 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 39 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 23:29:18,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-13 23:29:29,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-08-13 23:29:46,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2024-08-13 23:29:52,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2367130.0, ans=0.02 2024-08-13 23:29:59,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2367130.0, ans=0.1 2024-08-13 23:30:02,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4850, loss[loss=0.1087, beats_loss=0.01077, ecapa_loss=0.0001474, whisper_loss=0.09644, over 21974.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001595, whisper_loss=0.09108, over 3909197.20 frames. ], batch size: 87, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:30:04,055 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 23:30:19,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2367330.0, ans=0.125 2024-08-13 23:30:27,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2367430.0, ans=0.125 2024-08-13 23:30:32,454 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-13 23:30:35,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.351e+01 2.637e+01 2.912e+01 5.043e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:30:50,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2367530.0, ans=0.0 2024-08-13 23:31:07,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4900, loss[loss=0.09377, beats_loss=0.0121, ecapa_loss=0.0001353, whisper_loss=0.08031, over 16492.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0109, ecapa_loss=0.0001596, whisper_loss=0.09064, over 3866428.77 frames. ], batch size: 64, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:31:08,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2367730.0, ans=0.09899494936611666 2024-08-13 23:31:14,905 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 23:31:15,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2367730.0, ans=0.2 2024-08-13 23:31:16,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-13 23:31:26,329 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 23:31:34,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2367930.0, ans=0.0 2024-08-13 23:31:38,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2367930.0, ans=0.125 2024-08-13 23:31:39,181 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 23:31:48,392 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 23:31:49,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2368030.0, ans=0.0 2024-08-13 23:31:58,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2368030.0, ans=0.2 2024-08-13 23:32:01,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2368130.0, ans=0.125 2024-08-13 23:32:12,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2368230.0, ans=0.0 2024-08-13 23:32:13,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 4950, loss[loss=0.09138, beats_loss=0.01251, ecapa_loss=0.0001438, whisper_loss=0.07743, over 13258.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001615, whisper_loss=0.09107, over 3863775.85 frames. ], batch size: 54, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:32:16,484 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 23:32:46,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.294e+01 2.547e+01 2.845e+01 3.862e+01, threshold=5.095e+01, percent-clipped=0.0 2024-08-13 23:32:54,414 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 23:33:11,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2368630.0, ans=0.125 2024-08-13 23:33:15,107 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 23:33:15,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2024-08-13 23:33:16,662 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 23:33:19,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5000, loss[loss=0.1155, beats_loss=0.01056, ecapa_loss=0.0001506, whisper_loss=0.1034, over 22825.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001617, whisper_loss=0.0911, over 3842174.06 frames. ], batch size: 88, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:33:19,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2368730.0, ans=0.125 2024-08-13 23:33:21,717 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 23:33:25,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2368730.0, ans=0.2 2024-08-13 23:33:33,483 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 23:33:53,589 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 23:34:17,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2369130.0, ans=0.125 2024-08-13 23:34:17,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2369130.0, ans=0.125 2024-08-13 23:34:23,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5050, loss[loss=0.1034, beats_loss=0.01152, ecapa_loss=0.0001704, whisper_loss=0.09022, over 17626.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.000161, whisper_loss=0.09093, over 3864297.87 frames. ], batch size: 74, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:34:40,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-13 23:34:43,883 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 23:34:45,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2369330.0, ans=0.125 2024-08-13 23:34:49,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2369430.0, ans=0.1 2024-08-13 23:34:55,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.306e+01 2.530e+01 2.921e+01 5.103e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-13 23:34:59,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2369430.0, ans=0.07 2024-08-13 23:35:03,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2369530.0, ans=0.125 2024-08-13 23:35:04,528 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 23:35:17,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2369630.0, ans=0.2 2024-08-13 23:35:19,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2369630.0, ans=0.1 2024-08-13 23:35:21,242 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-08-13 23:35:23,019 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 23:35:24,212 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 23:35:27,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5100, loss[loss=0.131, beats_loss=0.009194, ecapa_loss=0.0001532, whisper_loss=0.1203, over 24216.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001602, whisper_loss=0.09112, over 3851091.50 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:35:30,409 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 23:35:48,503 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 23:35:52,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2369930.0, ans=0.5 2024-08-13 23:35:56,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2369930.0, ans=0.125 2024-08-13 23:36:14,309 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 23:36:17,048 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 23:36:18,346 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-13 23:36:23,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2370130.0, ans=0.1 2024-08-13 23:36:32,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5150, loss[loss=0.09335, beats_loss=0.01075, ecapa_loss=0.0001657, whisper_loss=0.08094, over 14679.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001602, whisper_loss=0.09147, over 3854566.14 frames. ], batch size: 57, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:36:35,055 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 23:36:44,478 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 23:36:46,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-13 23:36:47,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2370330.0, ans=0.125 2024-08-13 23:36:50,737 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 23:37:05,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.435e+01 2.636e+01 3.072e+01 5.034e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-13 23:37:07,814 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 23:37:14,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.24 vs. limit=8.0 2024-08-13 23:37:17,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2370530.0, ans=0.125 2024-08-13 23:37:23,153 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 23:37:24,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2370630.0, ans=0.0 2024-08-13 23:37:28,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2370630.0, ans=10.0 2024-08-13 23:37:36,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2370730.0, ans=0.1 2024-08-13 23:37:37,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5200, loss[loss=0.07162, beats_loss=0.01318, ecapa_loss=0.0001678, whisper_loss=0.05676, over 20114.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001586, whisper_loss=0.09138, over 3864285.88 frames. ], batch size: 86, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:37:51,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2370830.0, ans=0.07 2024-08-13 23:37:58,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2370830.0, ans=0.125 2024-08-13 23:38:03,986 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 23:38:10,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2370930.0, ans=0.125 2024-08-13 23:38:16,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2371030.0, ans=0.025 2024-08-13 23:38:30,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2371130.0, ans=0.04949747468305833 2024-08-13 23:38:40,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5250, loss[loss=0.1235, beats_loss=0.01013, ecapa_loss=0.0001329, whisper_loss=0.112, over 23519.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001579, whisper_loss=0.09103, over 3833254.40 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:38:53,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2371330.0, ans=0.125 2024-08-13 23:39:05,707 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 23:39:13,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.304e+01 2.584e+01 2.839e+01 8.080e+01, threshold=5.168e+01, percent-clipped=1.0 2024-08-13 23:39:17,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2371430.0, ans=0.125 2024-08-13 23:39:34,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-13 23:39:41,516 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 23:39:43,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2371630.0, ans=0.125 2024-08-13 23:39:45,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5300, loss[loss=0.1153, beats_loss=0.009017, ecapa_loss=0.0001494, whisper_loss=0.1048, over 22027.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.09206, over 3858528.50 frames. ], batch size: 84, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:39:53,387 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 23:40:02,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2371830.0, ans=0.125 2024-08-13 23:40:04,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2371830.0, ans=0.1 2024-08-13 23:40:11,075 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 15 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 23:40:19,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2371930.0, ans=0.2 2024-08-13 23:40:34,132 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 23:40:37,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2372130.0, ans=0.125 2024-08-13 23:40:39,327 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-13 23:40:39,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2372130.0, ans=0.125 2024-08-13 23:40:47,866 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-13 23:40:48,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5350, loss[loss=0.0871, beats_loss=0.01426, ecapa_loss=0.0001167, whisper_loss=0.07168, over 20476.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001586, whisper_loss=0.09133, over 3858008.45 frames. ], batch size: 82, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:41:13,833 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-13 23:41:21,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.441e+01 2.659e+01 2.902e+01 4.183e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-13 23:41:21,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2372430.0, ans=0.1 2024-08-13 23:41:26,514 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 23:41:34,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2372530.0, ans=0.125 2024-08-13 23:41:38,430 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 23:41:44,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-13 23:41:49,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2372630.0, ans=0.125 2024-08-13 23:41:51,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2024-08-13 23:41:53,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5400, loss[loss=0.1085, beats_loss=0.01067, ecapa_loss=0.0001642, whisper_loss=0.09618, over 15137.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001589, whisper_loss=0.09136, over 3867388.28 frames. ], batch size: 59, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:41:56,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=12.0 2024-08-13 23:41:58,611 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 23:42:11,116 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 23:42:17,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-13 23:42:34,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2024-08-13 23:42:41,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2373030.0, ans=0.1 2024-08-13 23:42:46,477 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 23:42:46,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2373130.0, ans=0.125 2024-08-13 23:42:51,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2373130.0, ans=0.2 2024-08-13 23:42:57,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5450, loss[loss=0.113, beats_loss=0.008949, ecapa_loss=0.0001298, whisper_loss=0.1028, over 15899.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001604, whisper_loss=0.09198, over 3884008.95 frames. ], batch size: 54, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:43:04,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2373230.0, ans=0.125 2024-08-13 23:43:09,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2373330.0, ans=0.1 2024-08-13 23:43:11,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2373330.0, ans=0.1 2024-08-13 23:43:25,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2373430.0, ans=0.0 2024-08-13 23:43:27,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2373430.0, ans=0.125 2024-08-13 23:43:28,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-08-13 23:43:29,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.305e+01 2.546e+01 2.870e+01 4.387e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-13 23:43:31,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2373430.0, ans=0.0 2024-08-13 23:43:45,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2373530.0, ans=0.0 2024-08-13 23:43:55,839 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-13 23:44:02,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5500, loss[loss=0.1171, beats_loss=0.008515, ecapa_loss=0.0001331, whisper_loss=0.1073, over 14784.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01057, ecapa_loss=0.0001591, whisper_loss=0.09243, over 3890040.51 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:44:03,969 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-13 23:44:11,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2373730.0, ans=0.1 2024-08-13 23:44:11,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-13 23:44:17,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2373830.0, ans=0.125 2024-08-13 23:44:31,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2373930.0, ans=0.125 2024-08-13 23:44:37,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2373930.0, ans=0.0 2024-08-13 23:45:10,872 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 23:45:14,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5550, loss[loss=0.1138, beats_loss=0.01036, ecapa_loss=0.0001541, whisper_loss=0.1019, over 23144.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01058, ecapa_loss=0.0001585, whisper_loss=0.09291, over 3919997.77 frames. ], batch size: 91, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:45:47,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2374430.0, ans=0.2 2024-08-13 23:45:51,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.310e+01 2.523e+01 2.896e+01 4.190e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-13 23:46:01,997 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:46:06,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2374530.0, ans=0.015 2024-08-13 23:46:10,236 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.734e+01 2024-08-13 23:46:17,073 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 23:46:21,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2374630.0, ans=0.0 2024-08-13 23:46:26,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5600, loss[loss=0.1013, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.08909, over 17663.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001593, whisper_loss=0.09244, over 3916935.30 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:46:36,869 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:46:51,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2374830.0, ans=0.0 2024-08-13 23:46:59,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2374930.0, ans=0.125 2024-08-13 23:47:01,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2374930.0, ans=0.0 2024-08-13 23:47:01,493 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.586e+05 2024-08-13 23:47:07,114 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 23:47:14,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2375030.0, ans=0.125 2024-08-13 23:47:16,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2375030.0, ans=0.95 2024-08-13 23:47:39,154 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 23:47:39,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5650, loss[loss=0.09596, beats_loss=0.01176, ecapa_loss=0.0001435, whisper_loss=0.08276, over 17846.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001582, whisper_loss=0.09166, over 3918737.13 frames. ], batch size: 71, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:47:43,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2375230.0, ans=0.0 2024-08-13 23:47:43,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2375230.0, ans=0.1 2024-08-13 23:47:53,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2375330.0, ans=0.125 2024-08-13 23:48:01,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2375330.0, ans=0.0 2024-08-13 23:48:01,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2375330.0, ans=0.125 2024-08-13 23:48:13,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.432e+01 2.622e+01 2.958e+01 1.611e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-13 23:48:41,151 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 23:48:46,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5700, loss[loss=0.1079, beats_loss=0.01002, ecapa_loss=0.0001664, whisper_loss=0.09621, over 18156.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09141, over 3943378.86 frames. ], batch size: 71, lr: 3.70e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:48:46,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2375730.0, ans=0.125 2024-08-13 23:48:53,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2375730.0, ans=0.0 2024-08-13 23:48:54,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2375730.0, ans=0.125 2024-08-13 23:49:04,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2375830.0, ans=0.2 2024-08-13 23:49:20,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2375930.0, ans=0.125 2024-08-13 23:49:29,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2376030.0, ans=0.0 2024-08-13 23:49:30,844 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 23:49:39,806 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 23:49:40,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2376030.0, ans=0.0 2024-08-13 23:49:52,343 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 23:49:57,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5750, loss[loss=0.1031, beats_loss=0.008421, ecapa_loss=0.0001987, whisper_loss=0.09267, over 18225.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001602, whisper_loss=0.0911, over 3951680.17 frames. ], batch size: 75, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:50:09,945 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 23:50:13,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-13 23:50:15,551 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 23:50:32,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.376e+01 2.677e+01 2.886e+01 5.408e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-13 23:50:33,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.32 vs. limit=22.5 2024-08-13 23:50:36,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2376430.0, ans=0.125 2024-08-13 23:50:44,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-13 23:50:54,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2376630.0, ans=0.125 2024-08-13 23:51:01,022 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 23:51:08,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2376730.0, ans=0.1 2024-08-13 23:51:09,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5800, loss[loss=0.08638, beats_loss=0.011, ecapa_loss=0.0001936, whisper_loss=0.07344, over 15623.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001601, whisper_loss=0.09087, over 3909216.11 frames. ], batch size: 64, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:51:20,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2376730.0, ans=0.125 2024-08-13 23:52:00,717 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 23:52:01,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-13 23:52:18,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5850, loss[loss=0.09114, beats_loss=0.01364, ecapa_loss=0.00011, whisper_loss=0.0764, over 21150.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001599, whisper_loss=0.09099, over 3919146.44 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:52:27,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2377230.0, ans=0.125 2024-08-13 23:52:30,473 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 23:52:33,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2377330.0, ans=0.5 2024-08-13 23:52:34,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2377330.0, ans=0.125 2024-08-13 23:52:36,927 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 23:52:47,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2377430.0, ans=0.1 2024-08-13 23:52:49,795 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 23:52:50,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.427e+01 2.667e+01 3.028e+01 6.435e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-13 23:52:51,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2377430.0, ans=0.1 2024-08-13 23:53:12,251 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 23:53:16,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2377630.0, ans=0.125 2024-08-13 23:53:18,671 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 23:53:23,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5900, loss[loss=0.09141, beats_loss=0.01167, ecapa_loss=0.0001572, whisper_loss=0.07817, over 16682.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001598, whisper_loss=0.09025, over 3907824.63 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-13 23:53:24,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2377730.0, ans=0.0 2024-08-13 23:53:26,365 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 23:53:34,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2377730.0, ans=0.125 2024-08-13 23:53:44,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=22.5 2024-08-13 23:53:46,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-13 23:53:51,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2377930.0, ans=0.125 2024-08-13 23:53:57,797 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 23:54:14,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2378130.0, ans=0.1 2024-08-13 23:54:16,897 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 23:54:25,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2378130.0, ans=0.125 2024-08-13 23:54:28,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 5950, loss[loss=0.09916, beats_loss=0.01024, ecapa_loss=0.000164, whisper_loss=0.08728, over 18687.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001605, whisper_loss=0.09049, over 3918728.93 frames. ], batch size: 75, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:54:55,454 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 23:55:00,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.346e+01 2.593e+01 2.833e+01 5.502e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-13 23:55:01,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2024-08-13 23:55:04,685 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.626e+00 2024-08-13 23:55:05,498 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 23:55:06,783 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 23:55:10,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2378530.0, ans=0.125 2024-08-13 23:55:22,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2378630.0, ans=0.125 2024-08-13 23:55:24,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2378630.0, ans=0.1 2024-08-13 23:55:32,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6000, loss[loss=0.1045, beats_loss=0.01059, ecapa_loss=0.0001386, whisper_loss=0.09254, over 22070.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001598, whisper_loss=0.09072, over 3893170.62 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:55:32,579 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-13 23:56:14,162 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005558, whisper_loss=0.2472, over 922467.00 frames. 2024-08-13 23:56:35,125 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on SV_voxceleb1: loss=0.004377, beats_loss=0, ecapa_loss=0.0004377, whisper_loss=0, over 939242.00 frames. 2024-08-13 23:58:33,259 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on AT_audioset: loss=0.02362, beats_loss=0.02362, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 23:58:33,268 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-13 23:58:40,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2378730.0, ans=0.0 2024-08-13 23:58:43,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-13 23:58:54,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2378830.0, ans=0.125 2024-08-13 23:59:04,834 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 23:59:20,711 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09559547156095505, model_norm_threshold=51.8635368347168 2024-08-13 23:59:20,936 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.554e+04, grad_sumsq=7.554e+04, orig_rms_sq=1.000e+00 2024-08-13 23:59:29,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=2379130.0, ans=0.02 2024-08-13 23:59:36,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-08-13 23:59:44,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6050, loss[loss=0.07558, beats_loss=0.01317, ecapa_loss=0.0001325, whisper_loss=0.06109, over 14583.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.000161, whisper_loss=0.09083, over 3879935.54 frames. ], batch size: 58, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-13 23:59:47,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-08-13 23:59:48,749 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 23:59:57,017 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 23:59:57,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2379330.0, ans=0.2 2024-08-14 00:00:16,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2024-08-14 00:00:18,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2379430.0, ans=0.125 2024-08-14 00:00:20,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.343e+01 2.535e+01 2.756e+01 5.425e+02, threshold=5.070e+01, percent-clipped=3.0 2024-08-14 00:00:23,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2379430.0, ans=0.2 2024-08-14 00:00:26,287 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.846e+05 2024-08-14 00:00:36,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2379530.0, ans=0.0 2024-08-14 00:00:42,413 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 24 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-14 00:00:51,791 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 00:00:52,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2379630.0, ans=0.1 2024-08-14 00:00:57,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6100, loss[loss=0.07495, beats_loss=0.01158, ecapa_loss=0.0001489, whisper_loss=0.06188, over 14429.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.0001604, whisper_loss=0.09067, over 3881478.88 frames. ], batch size: 59, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:01:43,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2380030.0, ans=0.125 2024-08-14 00:01:46,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2380030.0, ans=0.025 2024-08-14 00:01:47,801 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 00:02:04,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6150, loss[loss=0.09921, beats_loss=0.01064, ecapa_loss=0.0001675, whisper_loss=0.0869, over 18676.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01094, ecapa_loss=0.000161, whisper_loss=0.09037, over 3878766.06 frames. ], batch size: 74, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:02:14,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2380230.0, ans=0.125 2024-08-14 00:02:17,449 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 00:02:19,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2380330.0, ans=0.0 2024-08-14 00:02:31,659 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 00:02:35,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2380430.0, ans=0.025 2024-08-14 00:02:36,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.475e+01 2.774e+01 3.233e+01 4.746e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 00:02:46,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2380530.0, ans=0.1 2024-08-14 00:02:51,390 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-14 00:03:01,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2380630.0, ans=0.0 2024-08-14 00:03:09,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6200, loss[loss=0.09738, beats_loss=0.01196, ecapa_loss=0.0001231, whisper_loss=0.08419, over 18455.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01093, ecapa_loss=0.0001592, whisper_loss=0.09032, over 3906693.67 frames. ], batch size: 70, lr: 3.69e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:03:12,218 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 00:03:46,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2380930.0, ans=0.0 2024-08-14 00:03:50,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2381030.0, ans=0.0 2024-08-14 00:03:51,510 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 00:04:09,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2381130.0, ans=0.125 2024-08-14 00:04:15,107 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6250, loss[loss=0.0852, beats_loss=0.0105, ecapa_loss=0.0001758, whisper_loss=0.07295, over 17629.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01092, ecapa_loss=0.0001595, whisper_loss=0.09041, over 3937808.31 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:04:15,516 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 00:04:21,631 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 00:04:26,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2381330.0, ans=0.125 2024-08-14 00:04:26,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2381330.0, ans=0.0 2024-08-14 00:04:28,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2381330.0, ans=0.0 2024-08-14 00:04:48,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.400e+01 2.693e+01 3.116e+01 1.076e+02, threshold=5.386e+01, percent-clipped=3.0 2024-08-14 00:04:58,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2381530.0, ans=0.0 2024-08-14 00:05:02,876 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 00:05:19,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6300, loss[loss=0.1029, beats_loss=0.01251, ecapa_loss=0.0001134, whisper_loss=0.08929, over 24051.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001603, whisper_loss=0.08997, over 3925087.94 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:05:28,314 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 00:05:33,511 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 00:05:34,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2381830.0, ans=0.0 2024-08-14 00:05:36,737 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.153e-02 2024-08-14 00:05:45,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2381930.0, ans=0.125 2024-08-14 00:06:03,466 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 00:06:03,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2382030.0, ans=0.0 2024-08-14 00:06:22,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2382230.0, ans=0.125 2024-08-14 00:06:23,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6350, loss[loss=0.09638, beats_loss=0.009078, ecapa_loss=0.000218, whisper_loss=0.08513, over 20489.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001609, whisper_loss=0.09075, over 3900408.35 frames. ], batch size: 91, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:06:24,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2382230.0, ans=0.125 2024-08-14 00:06:41,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2382330.0, ans=0.125 2024-08-14 00:06:43,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2382330.0, ans=0.125 2024-08-14 00:06:44,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-14 00:06:47,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2382330.0, ans=10.0 2024-08-14 00:06:47,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2382330.0, ans=0.125 2024-08-14 00:06:54,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2382430.0, ans=0.1 2024-08-14 00:06:54,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2382430.0, ans=0.05 2024-08-14 00:06:57,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-14 00:06:57,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.344e+01 2.620e+01 2.945e+01 1.011e+02, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 00:07:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2382530.0, ans=0.2 2024-08-14 00:07:08,258 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 00:07:23,737 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 00:07:28,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6400, loss[loss=0.1184, beats_loss=0.01059, ecapa_loss=0.0001539, whisper_loss=0.1063, over 20973.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001614, whisper_loss=0.09169, over 3908459.41 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:07:34,401 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 00:07:45,937 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 00:07:48,577 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 00:07:57,930 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 00:08:03,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.43 vs. limit=15.0 2024-08-14 00:08:07,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2383030.0, ans=0.2 2024-08-14 00:08:13,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2383030.0, ans=0.1 2024-08-14 00:08:16,875 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 00:08:29,778 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 00:08:34,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6450, loss[loss=0.09856, beats_loss=0.009345, ecapa_loss=0.0001877, whisper_loss=0.08734, over 21948.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001617, whisper_loss=0.09167, over 3922003.89 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:08:37,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2383230.0, ans=0.0 2024-08-14 00:08:38,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2383230.0, ans=0.1 2024-08-14 00:08:39,129 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 00:08:48,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-14 00:08:52,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2383330.0, ans=0.125 2024-08-14 00:08:55,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-14 00:08:57,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2383330.0, ans=0.125 2024-08-14 00:09:04,586 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 00:09:06,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.325e+01 2.600e+01 2.932e+01 4.417e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-14 00:09:10,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.38 vs. limit=22.5 2024-08-14 00:09:12,860 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 00:09:13,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-14 00:09:15,292 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 00:09:15,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2383530.0, ans=0.0 2024-08-14 00:09:21,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2383530.0, ans=0.1 2024-08-14 00:09:23,942 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 00:09:28,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.70 vs. limit=15.0 2024-08-14 00:09:28,895 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 00:09:33,073 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 00:09:37,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6500, loss[loss=0.09534, beats_loss=0.01187, ecapa_loss=0.0001537, whisper_loss=0.08193, over 21927.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.09165, over 3917537.25 frames. ], batch size: 92, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:09:37,931 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 00:09:49,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2383830.0, ans=0.125 2024-08-14 00:09:58,651 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 00:10:00,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2383830.0, ans=0.125 2024-08-14 00:10:01,183 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 00:10:01,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2383830.0, ans=0.125 2024-08-14 00:10:02,445 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 00:10:08,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2383930.0, ans=0.0 2024-08-14 00:10:08,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2383930.0, ans=0.0 2024-08-14 00:10:14,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2383930.0, ans=0.0 2024-08-14 00:10:23,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2384030.0, ans=0.125 2024-08-14 00:10:36,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2384130.0, ans=0.05 2024-08-14 00:10:36,990 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 00:10:37,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-08-14 00:10:38,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-14 00:10:40,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.81 vs. limit=15.0 2024-08-14 00:10:41,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6550, loss[loss=0.09821, beats_loss=0.01132, ecapa_loss=0.0001696, whisper_loss=0.0852, over 19819.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.00016, whisper_loss=0.0912, over 3906109.64 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:10:56,201 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 00:11:05,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2384330.0, ans=0.1 2024-08-14 00:11:07,806 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 00:11:10,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2384430.0, ans=0.125 2024-08-14 00:11:15,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.424e+01 2.648e+01 2.996e+01 4.448e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-14 00:11:17,738 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 00:11:18,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2384430.0, ans=0.125 2024-08-14 00:11:27,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-08-14 00:11:32,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-14 00:11:41,476 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 00:11:45,382 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09902217984199524, model_norm_threshold=52.96651840209961 2024-08-14 00:11:45,558 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.30, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.471e+04, grad_sumsq=8.471e+04, orig_rms_sq=1.000e+00 2024-08-14 00:11:45,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6600, loss[loss=0.1091, beats_loss=0.01002, ecapa_loss=0.0001565, whisper_loss=0.09752, over 16773.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001614, whisper_loss=0.09121, over 3912208.84 frames. ], batch size: 64, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:11:51,866 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 00:11:54,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2384730.0, ans=0.125 2024-08-14 00:12:00,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2384830.0, ans=0.1 2024-08-14 00:12:00,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2384830.0, ans=0.1 2024-08-14 00:12:01,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2384830.0, ans=0.2 2024-08-14 00:12:21,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2384930.0, ans=0.0 2024-08-14 00:12:40,243 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 00:12:49,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6650, loss[loss=0.1062, beats_loss=0.01111, ecapa_loss=0.0001471, whisper_loss=0.09358, over 22647.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001613, whisper_loss=0.09099, over 3922784.90 frames. ], batch size: 93, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:13:09,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2385330.0, ans=0.125 2024-08-14 00:13:15,823 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 00:13:20,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2385430.0, ans=0.0 2024-08-14 00:13:22,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.456e+01 2.724e+01 3.056e+01 5.349e+02, threshold=5.448e+01, percent-clipped=1.0 2024-08-14 00:13:25,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2385430.0, ans=0.125 2024-08-14 00:13:26,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2385530.0, ans=0.0 2024-08-14 00:13:29,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2385530.0, ans=0.5 2024-08-14 00:13:36,310 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 00:13:43,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2385630.0, ans=0.1 2024-08-14 00:13:51,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2385630.0, ans=0.125 2024-08-14 00:13:53,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6700, loss[loss=0.1333, beats_loss=0.01011, ecapa_loss=0.0001814, whisper_loss=0.1214, over 21882.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001603, whisper_loss=0.09112, over 3905552.96 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:14:02,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2385730.0, ans=0.125 2024-08-14 00:14:24,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2385930.0, ans=0.125 2024-08-14 00:14:30,526 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 00:14:40,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=12.0 2024-08-14 00:14:54,923 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 00:14:57,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-14 00:14:57,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6750, loss[loss=0.09023, beats_loss=0.01019, ecapa_loss=0.0001884, whisper_loss=0.07815, over 13591.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001617, whisper_loss=0.0911, over 3878276.50 frames. ], batch size: 54, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:15:01,737 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 00:15:02,862 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 00:15:31,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.377e+01 2.658e+01 2.891e+01 6.359e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-14 00:15:34,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2024-08-14 00:15:35,380 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 00:15:43,440 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 00:16:02,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6800, loss[loss=0.09256, beats_loss=0.0104, ecapa_loss=0.0001652, whisper_loss=0.08051, over 19572.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001616, whisper_loss=0.09098, over 3883207.27 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:16:09,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2386730.0, ans=0.0 2024-08-14 00:16:21,619 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 00:16:25,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2386830.0, ans=0.0 2024-08-14 00:16:30,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2386930.0, ans=0.1 2024-08-14 00:16:35,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2386930.0, ans=0.125 2024-08-14 00:16:47,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2387030.0, ans=0.0 2024-08-14 00:16:50,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2387030.0, ans=0.125 2024-08-14 00:16:50,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2387030.0, ans=0.125 2024-08-14 00:16:51,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2387030.0, ans=0.2 2024-08-14 00:16:53,662 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 00:16:53,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2387130.0, ans=0.125 2024-08-14 00:17:01,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-14 00:17:03,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2387130.0, ans=0.1 2024-08-14 00:17:06,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6850, loss[loss=0.1147, beats_loss=0.01107, ecapa_loss=0.000171, whisper_loss=0.1019, over 19835.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001608, whisper_loss=0.09092, over 3871742.00 frames. ], batch size: 80, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:17:41,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.424e+01 2.658e+01 2.894e+01 9.462e+01, threshold=5.316e+01, percent-clipped=2.0 2024-08-14 00:17:45,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2387530.0, ans=0.125 2024-08-14 00:17:47,713 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 00:17:57,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2387530.0, ans=0.0 2024-08-14 00:18:11,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6900, loss[loss=0.07293, beats_loss=0.01351, ecapa_loss=0.0001478, whisper_loss=0.05794, over 16660.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001607, whisper_loss=0.09098, over 3878021.59 frames. ], batch size: 70, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:18:50,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2387930.0, ans=0.07 2024-08-14 00:19:08,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2388130.0, ans=0.0 2024-08-14 00:19:08,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2388130.0, ans=0.1 2024-08-14 00:19:13,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-08-14 00:19:14,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2388130.0, ans=0.125 2024-08-14 00:19:20,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 6950, loss[loss=0.09447, beats_loss=0.01056, ecapa_loss=0.0001282, whisper_loss=0.08263, over 16347.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001607, whisper_loss=0.09176, over 3855817.07 frames. ], batch size: 62, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:19:35,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2388330.0, ans=0.0 2024-08-14 00:19:57,611 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 00:20:00,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.466e+01 2.702e+01 3.028e+01 4.381e+01, threshold=5.405e+01, percent-clipped=0.0 2024-08-14 00:20:07,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2388530.0, ans=0.0 2024-08-14 00:20:15,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=15.0 2024-08-14 00:20:29,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2388630.0, ans=0.125 2024-08-14 00:20:35,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2388630.0, ans=0.125 2024-08-14 00:20:38,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7000, loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0002037, whisper_loss=0.09037, over 19347.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001602, whisper_loss=0.0917, over 3876093.96 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:20:55,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2388830.0, ans=0.2 2024-08-14 00:20:55,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-14 00:20:57,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2024-08-14 00:21:02,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2388830.0, ans=0.0 2024-08-14 00:21:02,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2024-08-14 00:21:16,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=15.0 2024-08-14 00:21:33,020 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 00:21:38,905 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 00:21:55,418 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 00:21:58,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7050, loss[loss=0.1069, beats_loss=0.01181, ecapa_loss=0.0001555, whisper_loss=0.09356, over 19286.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001611, whisper_loss=0.09172, over 3876002.70 frames. ], batch size: 78, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:22:15,134 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 00:22:23,261 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.858e+01 2024-08-14 00:22:23,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-14 00:22:27,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-14 00:22:36,255 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:22:42,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.266e+01 2.592e+01 2.903e+01 1.485e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-14 00:22:54,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2389530.0, ans=0.0 2024-08-14 00:22:57,548 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 00:23:11,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2389630.0, ans=0.125 2024-08-14 00:23:12,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2389630.0, ans=0.0 2024-08-14 00:23:19,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7100, loss[loss=0.1226, beats_loss=0.01211, ecapa_loss=0.0001167, whisper_loss=0.1094, over 17416.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09241, over 3865667.33 frames. ], batch size: 67, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:23:24,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2389730.0, ans=0.0 2024-08-14 00:23:48,195 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 00:23:48,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-08-14 00:23:53,827 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 00:23:55,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2389930.0, ans=0.1 2024-08-14 00:24:04,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2389930.0, ans=0.125 2024-08-14 00:24:06,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2390030.0, ans=0.125 2024-08-14 00:24:39,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7150, loss[loss=0.09014, beats_loss=0.01107, ecapa_loss=0.0001681, whisper_loss=0.0774, over 21971.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.09271, over 3880127.60 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:24:44,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2390230.0, ans=0.04949747468305833 2024-08-14 00:24:52,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2390230.0, ans=0.125 2024-08-14 00:25:10,417 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 00:25:20,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.385e+01 2.638e+01 3.035e+01 4.278e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 00:25:20,629 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 00:25:22,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2390430.0, ans=0.0 2024-08-14 00:25:23,468 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 00:25:25,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2390530.0, ans=0.125 2024-08-14 00:25:38,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:25:57,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7200, loss[loss=0.1012, beats_loss=0.008017, ecapa_loss=0.0001898, whisper_loss=0.09129, over 15489.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01076, ecapa_loss=0.0001591, whisper_loss=0.092, over 3889112.09 frames. ], batch size: 59, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:26:01,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2390730.0, ans=0.0 2024-08-14 00:26:06,800 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 00:26:31,187 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 00:26:40,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2390930.0, ans=0.125 2024-08-14 00:27:03,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2391130.0, ans=0.0 2024-08-14 00:27:04,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2391130.0, ans=0.125 2024-08-14 00:27:09,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2391130.0, ans=0.125 2024-08-14 00:27:14,483 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7250, loss[loss=0.07823, beats_loss=0.01154, ecapa_loss=0.0001806, whisper_loss=0.06489, over 14949.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001606, whisper_loss=0.09168, over 3896261.57 frames. ], batch size: 65, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:27:15,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=12.0 2024-08-14 00:27:16,187 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 00:27:32,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2391330.0, ans=0.125 2024-08-14 00:27:40,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2391330.0, ans=10.0 2024-08-14 00:27:55,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.401e+01 2.589e+01 2.911e+01 7.095e+01, threshold=5.179e+01, percent-clipped=1.0 2024-08-14 00:28:04,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-14 00:28:17,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2391530.0, ans=0.1 2024-08-14 00:28:19,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2391630.0, ans=0.0 2024-08-14 00:28:26,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-08-14 00:28:27,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2391630.0, ans=0.1 2024-08-14 00:28:31,195 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 00:28:33,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7300, loss[loss=0.1198, beats_loss=0.01121, ecapa_loss=0.0001675, whisper_loss=0.1069, over 22013.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01073, ecapa_loss=0.0001602, whisper_loss=0.09184, over 3906542.26 frames. ], batch size: 88, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:28:37,268 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 00:28:38,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.42 vs. limit=10.0 2024-08-14 00:28:43,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2391730.0, ans=0.125 2024-08-14 00:29:21,475 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 00:29:24,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-14 00:29:50,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-08-14 00:29:50,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7350, loss[loss=0.099, beats_loss=0.008626, ecapa_loss=0.0001749, whisper_loss=0.08863, over 20141.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001599, whisper_loss=0.09078, over 3880526.26 frames. ], batch size: 80, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:30:14,196 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 00:30:19,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2392330.0, ans=0.0 2024-08-14 00:30:19,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2392330.0, ans=0.0 2024-08-14 00:30:22,808 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 00:30:23,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:26,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:28,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2392430.0, ans=0.0 2024-08-14 00:30:28,732 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.736e+00 2024-08-14 00:30:31,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2392430.0, ans=0.125 2024-08-14 00:30:32,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.397e+01 2.587e+01 2.821e+01 4.137e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-14 00:30:34,865 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 00:30:45,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2392530.0, ans=0.0 2024-08-14 00:31:09,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-14 00:31:12,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7400, loss[loss=0.09089, beats_loss=0.01129, ecapa_loss=0.0001732, whisper_loss=0.07787, over 22248.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001599, whisper_loss=0.09115, over 3901432.83 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:31:29,544 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 00:31:37,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2392830.0, ans=0.035 2024-08-14 00:31:53,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2392930.0, ans=0.0 2024-08-14 00:31:54,761 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 00:31:56,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2392930.0, ans=0.2 2024-08-14 00:31:59,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2392930.0, ans=0.125 2024-08-14 00:32:09,038 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 00:32:17,019 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 00:32:22,812 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 00:32:23,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2393130.0, ans=0.125 2024-08-14 00:32:32,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-14 00:32:34,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7450, loss[loss=0.1106, beats_loss=0.01072, ecapa_loss=0.0002466, whisper_loss=0.09741, over 18410.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001602, whisper_loss=0.09104, over 3912842.27 frames. ], batch size: 80, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:32:55,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2393330.0, ans=0.125 2024-08-14 00:33:06,229 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 00:33:13,070 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 00:33:19,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.395e+01 2.642e+01 3.080e+01 4.669e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-14 00:33:25,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2024-08-14 00:33:35,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2393530.0, ans=0.125 2024-08-14 00:33:38,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2393530.0, ans=0.0 2024-08-14 00:33:40,723 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 00:33:46,176 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 00:34:21,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2393630.0, ans=0.0 2024-08-14 00:34:23,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2393730.0, ans=0.05 2024-08-14 00:34:24,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7500, loss[loss=0.1227, beats_loss=0.009025, ecapa_loss=0.0001299, whisper_loss=0.1124, over 19655.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001591, whisper_loss=0.09171, over 3937050.69 frames. ], batch size: 72, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:34:43,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2393830.0, ans=0.0 2024-08-14 00:34:51,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2393830.0, ans=0.125 2024-08-14 00:35:00,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2393930.0, ans=0.0 2024-08-14 00:35:11,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2393930.0, ans=0.2 2024-08-14 00:35:18,702 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 00:35:20,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2394030.0, ans=0.125 2024-08-14 00:35:26,617 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 00:35:27,908 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 00:35:42,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2394130.0, ans=0.125 2024-08-14 00:35:44,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7550, loss[loss=0.1129, beats_loss=0.009211, ecapa_loss=0.0001585, whisper_loss=0.1021, over 20290.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01069, ecapa_loss=0.0001589, whisper_loss=0.09204, over 3896471.76 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:35:45,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2394230.0, ans=0.2 2024-08-14 00:35:45,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=2394230.0, ans=12.0 2024-08-14 00:36:01,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2394330.0, ans=0.07 2024-08-14 00:36:06,504 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 00:36:13,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2394330.0, ans=0.125 2024-08-14 00:36:26,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.563e+01 2.921e+01 3.982e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-14 00:36:28,515 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 00:36:31,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-14 00:36:55,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2394630.0, ans=0.0 2024-08-14 00:36:56,522 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 00:37:05,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7600, loss[loss=0.1025, beats_loss=0.009564, ecapa_loss=0.0001721, whisper_loss=0.09117, over 20246.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01065, ecapa_loss=0.0001589, whisper_loss=0.09264, over 3904054.98 frames. ], batch size: 81, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:37:11,273 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 00:37:13,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2394730.0, ans=0.125 2024-08-14 00:37:14,467 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-14 00:37:23,942 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:37:42,179 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 00:37:45,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2394930.0, ans=0.0 2024-08-14 00:37:47,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2394930.0, ans=0.125 2024-08-14 00:37:51,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2395030.0, ans=0.125 2024-08-14 00:38:00,874 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 00:38:09,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2395130.0, ans=0.0 2024-08-14 00:38:10,622 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 00:38:12,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-14 00:38:15,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2395130.0, ans=0.125 2024-08-14 00:38:21,439 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 00:38:23,136 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-14 00:38:24,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7650, loss[loss=0.1197, beats_loss=0.00705, ecapa_loss=0.0001861, whisper_loss=0.1108, over 14927.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01063, ecapa_loss=0.0001593, whisper_loss=0.0928, over 3899657.81 frames. ], batch size: 57, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:38:32,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2395230.0, ans=0.1 2024-08-14 00:38:38,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2395230.0, ans=0.125 2024-08-14 00:38:42,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2395330.0, ans=0.025 2024-08-14 00:38:43,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2395330.0, ans=0.1 2024-08-14 00:38:45,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2395330.0, ans=0.0 2024-08-14 00:38:52,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2395330.0, ans=0.1 2024-08-14 00:38:55,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2395330.0, ans=0.125 2024-08-14 00:39:07,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.336e+01 2.593e+01 2.907e+01 5.798e+01, threshold=5.186e+01, percent-clipped=1.0 2024-08-14 00:39:16,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2395530.0, ans=0.0 2024-08-14 00:39:37,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2395630.0, ans=0.07 2024-08-14 00:39:44,731 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 00:39:47,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7700, loss[loss=0.09985, beats_loss=0.0123, ecapa_loss=0.0001689, whisper_loss=0.08586, over 22714.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001589, whisper_loss=0.09244, over 3897387.74 frames. ], batch size: 94, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:39:51,046 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-14 00:40:06,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-14 00:40:32,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2024-08-14 00:40:34,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2396030.0, ans=0.125 2024-08-14 00:40:56,450 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-14 00:40:57,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2396130.0, ans=0.05 2024-08-14 00:40:58,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2396130.0, ans=0.1 2024-08-14 00:40:59,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2396130.0, ans=0.0 2024-08-14 00:41:04,170 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 00:41:05,960 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 00:41:07,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7750, loss[loss=0.1045, beats_loss=0.009148, ecapa_loss=0.0001424, whisper_loss=0.09392, over 16433.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01068, ecapa_loss=0.0001583, whisper_loss=0.09211, over 3903507.42 frames. ], batch size: 62, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:41:20,091 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 19 from LS+wenet, 31 from Vox, 45 fro AS 2024-08-14 00:41:21,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-08-14 00:41:22,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2396330.0, ans=0.125 2024-08-14 00:41:32,813 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 00:41:36,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2396330.0, ans=0.2 2024-08-14 00:41:49,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.485e+01 2.781e+01 3.099e+01 5.095e+01, threshold=5.562e+01, percent-clipped=0.0 2024-08-14 00:41:54,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2396530.0, ans=0.125 2024-08-14 00:42:15,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2396630.0, ans=0.125 2024-08-14 00:42:23,249 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 00:42:26,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7800, loss[loss=0.1112, beats_loss=0.009901, ecapa_loss=0.0001826, whisper_loss=0.09947, over 21364.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001579, whisper_loss=0.09163, over 3930872.86 frames. ], batch size: 89, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:42:27,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2396730.0, ans=0.0 2024-08-14 00:42:38,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2396730.0, ans=0.025 2024-08-14 00:42:40,775 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 00:43:10,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2396930.0, ans=0.0 2024-08-14 00:43:16,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=12.0 2024-08-14 00:43:30,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2397030.0, ans=0.125 2024-08-14 00:43:49,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7850, loss[loss=0.1076, beats_loss=0.01077, ecapa_loss=0.0001524, whisper_loss=0.09534, over 24078.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01084, ecapa_loss=0.0001585, whisper_loss=0.09122, over 3901361.70 frames. ], batch size: 96, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:43:50,015 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 00:43:51,312 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 00:44:29,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2397430.0, ans=0.0 2024-08-14 00:44:30,187 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.344e+01 2.594e+01 2.942e+01 8.076e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-14 00:44:30,326 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 00:44:35,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2397530.0, ans=0.1 2024-08-14 00:45:00,384 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 00:45:04,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2397630.0, ans=0.125 2024-08-14 00:45:05,278 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 00:45:08,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7900, loss[loss=0.1018, beats_loss=0.009498, ecapa_loss=0.0002169, whisper_loss=0.09013, over 20161.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001594, whisper_loss=0.09188, over 3939969.04 frames. ], batch size: 86, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:45:18,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2397730.0, ans=0.0 2024-08-14 00:45:25,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2397830.0, ans=0.09899494936611666 2024-08-14 00:45:28,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2397830.0, ans=0.2 2024-08-14 00:45:36,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2397830.0, ans=0.09899494936611666 2024-08-14 00:45:56,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2398030.0, ans=0.125 2024-08-14 00:46:07,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2398030.0, ans=0.5 2024-08-14 00:46:11,246 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 00:46:22,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-08-14 00:46:27,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 7950, loss[loss=0.116, beats_loss=0.008876, ecapa_loss=0.0001814, whisper_loss=0.1053, over 15411.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01083, ecapa_loss=0.00016, whisper_loss=0.09226, over 3966836.81 frames. ], batch size: 60, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:46:29,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2398230.0, ans=0.125 2024-08-14 00:46:32,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2398230.0, ans=0.04949747468305833 2024-08-14 00:46:33,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2398230.0, ans=0.125 2024-08-14 00:46:43,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2398330.0, ans=0.125 2024-08-14 00:46:53,857 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 00:47:06,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.380e+01 2.671e+01 3.071e+01 4.593e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 00:47:18,562 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 00:47:20,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-14 00:47:22,123 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 00:47:28,265 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 00:47:41,103 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8000, loss[loss=0.1246, beats_loss=0.008824, ecapa_loss=0.0001753, whisper_loss=0.114, over 15922.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001591, whisper_loss=0.09233, over 3931207.19 frames. ], batch size: 61, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:47:55,342 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 00:48:05,568 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 00:48:06,984 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 00:48:20,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2398930.0, ans=0.125 2024-08-14 00:48:30,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2399030.0, ans=0.1 2024-08-14 00:48:35,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2399030.0, ans=0.125 2024-08-14 00:48:43,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-14 00:48:48,181 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 00:48:57,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8050, loss[loss=0.09658, beats_loss=0.009986, ecapa_loss=0.0001611, whisper_loss=0.08498, over 17548.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01083, ecapa_loss=0.0001576, whisper_loss=0.09187, over 3919107.91 frames. ], batch size: 70, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:49:06,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2399230.0, ans=0.2 2024-08-14 00:49:10,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2399330.0, ans=0.125 2024-08-14 00:49:13,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2399330.0, ans=15.0 2024-08-14 00:49:17,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2399330.0, ans=0.125 2024-08-14 00:49:33,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.422e+01 2.734e+01 3.214e+01 1.918e+02, threshold=5.469e+01, percent-clipped=2.0 2024-08-14 00:49:41,453 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 00:49:46,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-14 00:49:52,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-14 00:50:04,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2399630.0, ans=0.125 2024-08-14 00:50:10,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8100, loss[loss=0.1178, beats_loss=0.01234, ecapa_loss=0.0001593, whisper_loss=0.1039, over 22357.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01087, ecapa_loss=0.0001571, whisper_loss=0.09197, over 3922019.42 frames. ], batch size: 90, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:50:11,451 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 00:50:16,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2024-08-14 00:50:34,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2024-08-14 00:50:55,313 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:51:10,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2400030.0, ans=0.0 2024-08-14 00:51:20,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2400130.0, ans=0.0 2024-08-14 00:51:28,725 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 00:51:32,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8150, loss[loss=0.09043, beats_loss=0.01178, ecapa_loss=0.0001714, whisper_loss=0.07693, over 16898.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01092, ecapa_loss=0.0001574, whisper_loss=0.09072, over 3897809.04 frames. ], batch size: 72, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:51:32,959 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-14 00:52:04,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-08-14 00:52:10,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2400430.0, ans=0.125 2024-08-14 00:52:13,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.374e+01 2.607e+01 2.976e+01 8.538e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-14 00:52:27,270 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 00:52:40,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2400630.0, ans=0.025 2024-08-14 00:52:48,037 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-14 00:52:49,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8200, loss[loss=0.1072, beats_loss=0.00782, ecapa_loss=0.000201, whisper_loss=0.09735, over 13788.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001577, whisper_loss=0.09048, over 3924035.05 frames. ], batch size: 55, lr: 3.68e-03, grad_scale: 5.764607523034235e+17 2024-08-14 00:52:51,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2024-08-14 00:52:56,556 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 00:53:02,961 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 00:53:12,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=12.0 2024-08-14 00:53:14,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2400830.0, ans=0.2 2024-08-14 00:53:30,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2024-08-14 00:54:06,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8250, loss[loss=0.1182, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.1062, over 13637.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001573, whisper_loss=0.09122, over 3894824.94 frames. ], batch size: 53, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:54:25,911 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 00:54:41,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2401430.0, ans=0.125 2024-08-14 00:54:43,086 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 00:54:46,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.415e+01 2.692e+01 3.047e+01 4.213e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-14 00:54:55,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2401530.0, ans=0.125 2024-08-14 00:54:55,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2401530.0, ans=0.2 2024-08-14 00:54:56,389 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 00:55:00,843 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 00:55:03,146 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 00:55:13,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-14 00:55:14,365 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 00:55:18,457 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 00:55:19,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2401630.0, ans=0.2 2024-08-14 00:55:26,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8300, loss[loss=0.1046, beats_loss=0.01041, ecapa_loss=0.0001343, whisper_loss=0.09284, over 21956.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001567, whisper_loss=0.09043, over 3889758.22 frames. ], batch size: 86, lr: 3.68e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:55:27,978 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 9 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 00:55:36,767 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 00:55:42,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2401830.0, ans=0.125 2024-08-14 00:55:49,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-08-14 00:55:56,888 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 00:56:06,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2401930.0, ans=0.0 2024-08-14 00:56:09,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-08-14 00:56:11,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2401930.0, ans=0.0 2024-08-14 00:56:14,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-08-14 00:56:25,396 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 00:56:36,203 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 00:56:42,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-14 00:56:52,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8350, loss[loss=0.1198, beats_loss=0.01005, ecapa_loss=0.0001541, whisper_loss=0.1082, over 17739.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001577, whisper_loss=0.09042, over 3895468.29 frames. ], batch size: 69, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:56:58,631 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 00:57:01,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2402230.0, ans=0.125 2024-08-14 00:57:16,755 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 00:57:21,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2402330.0, ans=0.125 2024-08-14 00:57:36,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.282e+01 2.635e+01 3.067e+01 5.691e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 00:57:47,745 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 00:57:50,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-14 00:58:03,538 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 00:58:18,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8400, loss[loss=0.08636, beats_loss=0.01392, ecapa_loss=0.0001367, whisper_loss=0.07108, over 23603.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.0001584, whisper_loss=0.09024, over 3897721.31 frames. ], batch size: 97, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:58:27,808 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 00:58:28,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2402730.0, ans=0.125 2024-08-14 00:58:42,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2402830.0, ans=0.0 2024-08-14 00:58:52,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2024-08-14 00:58:54,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2402930.0, ans=0.025 2024-08-14 00:59:20,890 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 00:59:27,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2403130.0, ans=0.2 2024-08-14 00:59:42,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8450, loss[loss=0.1049, beats_loss=0.007743, ecapa_loss=0.0002345, whisper_loss=0.09478, over 17916.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.09082, over 3883468.60 frames. ], batch size: 77, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 00:59:54,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-08-14 01:00:26,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.360e+01 2.603e+01 2.918e+01 4.445e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 01:00:30,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.48 vs. limit=22.5 2024-08-14 01:00:36,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2403530.0, ans=0.0 2024-08-14 01:00:44,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-14 01:00:44,942 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 01:00:50,365 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 01:01:02,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2403630.0, ans=0.1 2024-08-14 01:01:04,201 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 01:01:08,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8500, loss[loss=0.1062, beats_loss=0.01116, ecapa_loss=0.0001431, whisper_loss=0.09357, over 23221.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01082, ecapa_loss=0.0001588, whisper_loss=0.0906, over 3890159.87 frames. ], batch size: 93, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:01:17,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2403730.0, ans=0.125 2024-08-14 01:01:23,350 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.074e-02 2024-08-14 01:01:27,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2403830.0, ans=0.0 2024-08-14 01:01:29,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2403830.0, ans=0.125 2024-08-14 01:01:34,368 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 01:02:33,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8550, loss[loss=0.09155, beats_loss=0.01015, ecapa_loss=0.000177, whisper_loss=0.07963, over 20146.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001584, whisper_loss=0.09063, over 3895621.46 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:02:50,523 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 01:02:50,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2404330.0, ans=0.0 2024-08-14 01:03:18,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.347e+01 2.626e+01 2.928e+01 4.701e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 01:03:25,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2404530.0, ans=0.125 2024-08-14 01:03:32,513 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 01:03:38,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2404530.0, ans=0.0 2024-08-14 01:03:47,845 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 01:03:48,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2024-08-14 01:04:02,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8600, loss[loss=0.09624, beats_loss=0.0112, ecapa_loss=0.0001739, whisper_loss=0.0833, over 16830.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001597, whisper_loss=0.09163, over 3874390.45 frames. ], batch size: 68, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:04:15,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-14 01:04:28,958 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 01:04:37,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2404930.0, ans=0.07 2024-08-14 01:04:42,737 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 01:04:51,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-14 01:04:54,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2405030.0, ans=0.0 2024-08-14 01:04:55,619 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 01:05:01,102 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 01:05:10,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-14 01:05:20,154 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 35 from Vox, 27 fro AS 2024-08-14 01:05:29,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8650, loss[loss=0.1084, beats_loss=0.01011, ecapa_loss=0.000194, whisper_loss=0.09634, over 23059.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001604, whisper_loss=0.09124, over 3883820.56 frames. ], batch size: 92, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:05:40,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2024-08-14 01:05:41,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2405230.0, ans=0.125 2024-08-14 01:05:41,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=15.0 2024-08-14 01:06:00,058 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-14 01:06:06,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2405430.0, ans=0.1 2024-08-14 01:06:07,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2405430.0, ans=0.125 2024-08-14 01:06:08,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2405430.0, ans=0.125 2024-08-14 01:06:15,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.307e+01 2.549e+01 2.918e+01 2.030e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-14 01:06:38,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2405630.0, ans=0.125 2024-08-14 01:06:41,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2405630.0, ans=0.0 2024-08-14 01:06:49,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2405630.0, ans=0.1 2024-08-14 01:06:51,444 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 01:06:58,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8700, loss[loss=0.07668, beats_loss=0.01271, ecapa_loss=0.000164, whisper_loss=0.06233, over 22023.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001595, whisper_loss=0.09074, over 3892530.89 frames. ], batch size: 93, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:07:51,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2406030.0, ans=0.0 2024-08-14 01:08:01,438 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 01:08:03,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2406130.0, ans=0.125 2024-08-14 01:08:06,145 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 01:08:22,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8750, loss[loss=0.1017, beats_loss=0.009005, ecapa_loss=0.0001595, whisper_loss=0.09112, over 14891.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001606, whisper_loss=0.09078, over 3876587.08 frames. ], batch size: 56, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:08:22,811 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 01:08:30,763 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:09:04,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2406430.0, ans=0.125 2024-08-14 01:09:14,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.356e+01 2.644e+01 3.033e+01 3.229e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 01:09:21,580 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 01:09:25,832 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 01:09:38,525 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 01:09:40,807 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 30 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 01:09:43,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2406630.0, ans=0.125 2024-08-14 01:10:04,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2406730.0, ans=0.2 2024-08-14 01:10:05,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8800, loss[loss=0.09906, beats_loss=0.01267, ecapa_loss=0.0001452, whisper_loss=0.08493, over 20008.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.09105, over 3879355.18 frames. ], batch size: 79, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:10:25,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2406830.0, ans=0.07 2024-08-14 01:10:35,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2406830.0, ans=0.1 2024-08-14 01:10:40,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2406830.0, ans=0.125 2024-08-14 01:10:42,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2406930.0, ans=0.125 2024-08-14 01:10:46,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2406930.0, ans=0.125 2024-08-14 01:10:56,874 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 01:11:21,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2407030.0, ans=0.125 2024-08-14 01:11:28,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2407130.0, ans=0.2 2024-08-14 01:11:52,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8850, loss[loss=0.08575, beats_loss=0.01184, ecapa_loss=0.0001054, whisper_loss=0.07285, over 16443.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001592, whisper_loss=0.09112, over 3855406.88 frames. ], batch size: 60, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:12:04,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2407230.0, ans=0.1 2024-08-14 01:12:20,593 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 01:12:32,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2407330.0, ans=0.0 2024-08-14 01:12:53,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.365e+01 2.669e+01 3.063e+01 4.484e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-14 01:13:02,107 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:13:07,263 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 01:13:08,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2407530.0, ans=0.09899494936611666 2024-08-14 01:13:30,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2407630.0, ans=0.2 2024-08-14 01:13:31,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-08-14 01:13:32,889 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 01:13:45,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8900, loss[loss=0.1031, beats_loss=0.01109, ecapa_loss=0.0001334, whisper_loss=0.09067, over 23239.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.09126, over 3846418.18 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:13:46,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2407730.0, ans=0.125 2024-08-14 01:13:58,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2407730.0, ans=0.0 2024-08-14 01:13:59,568 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-14 01:14:33,417 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 01:14:49,087 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-14 01:15:17,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2408130.0, ans=0.2 2024-08-14 01:15:26,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 8950, loss[loss=0.1021, beats_loss=0.008645, ecapa_loss=0.0001696, whisper_loss=0.09181, over 18372.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001577, whisper_loss=0.09071, over 3850019.81 frames. ], batch size: 73, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:15:31,246 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 01:16:07,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-14 01:16:15,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-08-14 01:16:15,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.300e+01 2.488e+01 2.810e+01 4.417e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-14 01:16:15,773 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 01:16:16,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-08-14 01:16:27,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2408530.0, ans=0.2 2024-08-14 01:16:33,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2408530.0, ans=0.125 2024-08-14 01:16:33,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2408530.0, ans=0.0 2024-08-14 01:16:50,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9000, loss[loss=0.1205, beats_loss=0.009662, ecapa_loss=0.000212, whisper_loss=0.1087, over 21556.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.000158, whisper_loss=0.09052, over 3858468.16 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:16:50,479 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 01:17:02,740 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3776, 3.5300, 2.3778, 3.8257], device='cuda:3') 2024-08-14 01:17:32,662 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005618, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 01:17:50,453 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on SV_voxceleb1: loss=0.004363, beats_loss=0, ecapa_loss=0.0004363, whisper_loss=0, over 939242.00 frames. 2024-08-14 01:20:00,319 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on AT_audioset: loss=0.02365, beats_loss=0.02365, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 01:20:00,328 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 01:20:13,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-14 01:20:16,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-14 01:20:19,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2408830.0, ans=0.125 2024-08-14 01:20:22,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2408830.0, ans=0.125 2024-08-14 01:20:23,161 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 01:20:42,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2409030.0, ans=0.125 2024-08-14 01:20:53,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2409030.0, ans=0.0 2024-08-14 01:21:01,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2409130.0, ans=0.125 2024-08-14 01:21:01,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-14 01:21:08,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-14 01:21:13,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9050, loss[loss=0.09408, beats_loss=0.01061, ecapa_loss=0.0001379, whisper_loss=0.08209, over 15092.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001592, whisper_loss=0.09101, over 3859980.65 frames. ], batch size: 59, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:21:32,177 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 01:21:43,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2409430.0, ans=0.125 2024-08-14 01:21:52,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-08-14 01:21:52,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.446e+01 2.670e+01 2.988e+01 4.436e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 01:21:58,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-14 01:22:00,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2409530.0, ans=0.1 2024-08-14 01:22:13,921 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 01:22:17,005 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 01:22:28,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9100, loss[loss=0.09676, beats_loss=0.01372, ecapa_loss=0.0001241, whisper_loss=0.08181, over 16569.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01069, ecapa_loss=0.0001595, whisper_loss=0.09159, over 3873644.23 frames. ], batch size: 66, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:23:02,010 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-14 01:23:03,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2409930.0, ans=0.125 2024-08-14 01:23:16,777 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 01:23:17,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-14 01:23:21,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410030.0, ans=0.125 2024-08-14 01:23:24,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2410030.0, ans=0.125 2024-08-14 01:23:36,368 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 01:23:41,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2410130.0, ans=0.125 2024-08-14 01:23:48,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9150, loss[loss=0.109, beats_loss=0.008649, ecapa_loss=0.000207, whisper_loss=0.09829, over 15874.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01065, ecapa_loss=0.0001596, whisper_loss=0.09203, over 3874029.73 frames. ], batch size: 65, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:23:55,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-14 01:23:56,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2410230.0, ans=0.125 2024-08-14 01:24:05,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2410330.0, ans=0.0 2024-08-14 01:24:28,284 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 01:24:29,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.433e+01 2.654e+01 2.886e+01 8.462e+01, threshold=5.308e+01, percent-clipped=1.0 2024-08-14 01:24:51,546 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 34 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 01:24:58,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2410630.0, ans=0.1 2024-08-14 01:25:05,944 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 01:25:08,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9200, loss[loss=0.119, beats_loss=0.009643, ecapa_loss=0.000149, whisper_loss=0.1079, over 23718.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01059, ecapa_loss=0.0001595, whisper_loss=0.09214, over 3899492.27 frames. ], batch size: 91, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:25:18,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2410730.0, ans=0.1 2024-08-14 01:25:21,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2410730.0, ans=0.0 2024-08-14 01:25:25,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2410830.0, ans=0.0 2024-08-14 01:25:32,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2410830.0, ans=0.0 2024-08-14 01:25:40,579 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 01:25:52,982 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 01:25:58,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2411030.0, ans=0.125 2024-08-14 01:26:11,407 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 01:26:18,066 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 01:26:21,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.69 vs. limit=22.5 2024-08-14 01:26:30,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9250, loss[loss=0.09628, beats_loss=0.0118, ecapa_loss=0.0001557, whisper_loss=0.08292, over 17518.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001599, whisper_loss=0.09103, over 3893021.85 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:26:43,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2411230.0, ans=0.0 2024-08-14 01:26:43,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2411230.0, ans=0.125 2024-08-14 01:26:46,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2024-08-14 01:26:52,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-14 01:26:55,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2411330.0, ans=0.2 2024-08-14 01:27:00,206 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 01:27:10,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.291e+01 2.608e+01 2.884e+01 5.366e+01, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 01:27:11,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2411430.0, ans=0.2 2024-08-14 01:27:27,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2411530.0, ans=0.125 2024-08-14 01:27:32,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2411630.0, ans=0.125 2024-08-14 01:27:42,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2411630.0, ans=0.07 2024-08-14 01:27:49,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9300, loss[loss=0.07064, beats_loss=0.01377, ecapa_loss=0.0001795, whisper_loss=0.05507, over 16904.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001591, whisper_loss=0.09129, over 3885883.65 frames. ], batch size: 72, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:28:25,708 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 01:28:25,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2411930.0, ans=0.125 2024-08-14 01:28:31,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=22.5 2024-08-14 01:29:05,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2412130.0, ans=0.07 2024-08-14 01:29:05,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2412130.0, ans=0.0 2024-08-14 01:29:06,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2412230.0, ans=0.2 2024-08-14 01:29:07,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9350, loss[loss=0.1083, beats_loss=0.01164, ecapa_loss=0.0001372, whisper_loss=0.09533, over 15019.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001588, whisper_loss=0.09109, over 3881912.60 frames. ], batch size: 59, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:29:09,138 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 01:29:18,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2412230.0, ans=0.04949747468305833 2024-08-14 01:29:22,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2412330.0, ans=0.125 2024-08-14 01:29:25,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2412330.0, ans=0.0 2024-08-14 01:29:29,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2412330.0, ans=0.125 2024-08-14 01:29:32,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2412330.0, ans=0.0 2024-08-14 01:29:37,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-14 01:29:45,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2412430.0, ans=0.1 2024-08-14 01:29:47,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.279e+01 2.558e+01 2.915e+01 7.467e+01, threshold=5.116e+01, percent-clipped=2.0 2024-08-14 01:29:55,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2412530.0, ans=0.125 2024-08-14 01:30:26,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9400, loss[loss=0.1095, beats_loss=0.009787, ecapa_loss=0.0001677, whisper_loss=0.09803, over 16327.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001582, whisper_loss=0.09125, over 3857627.93 frames. ], batch size: 65, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:30:26,547 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 01:30:35,376 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-14 01:30:38,334 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 01:30:38,658 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.372e+05 2024-08-14 01:30:45,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2412830.0, ans=0.1 2024-08-14 01:30:53,758 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.157e-03 2024-08-14 01:30:55,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2412830.0, ans=0.09899494936611666 2024-08-14 01:31:02,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2412930.0, ans=0.125 2024-08-14 01:31:30,155 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-14 01:31:48,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2413230.0, ans=0.125 2024-08-14 01:31:48,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9450, loss[loss=0.1081, beats_loss=0.008212, ecapa_loss=0.0001439, whisper_loss=0.09844, over 19462.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001591, whisper_loss=0.0908, over 3830606.99 frames. ], batch size: 75, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:32:14,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2413330.0, ans=0.0 2024-08-14 01:32:21,891 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-14 01:32:30,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2413430.0, ans=0.0 2024-08-14 01:32:35,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.507e+01 2.797e+01 3.259e+01 9.131e+01, threshold=5.593e+01, percent-clipped=2.0 2024-08-14 01:32:46,304 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 01:32:56,239 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-14 01:33:16,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9500, loss[loss=0.108, beats_loss=0.01114, ecapa_loss=0.0001469, whisper_loss=0.09539, over 22841.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001603, whisper_loss=0.09082, over 3857617.09 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:33:22,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2413730.0, ans=0.0 2024-08-14 01:33:25,109 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 01:33:32,997 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 01:33:39,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2413830.0, ans=0.125 2024-08-14 01:33:43,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2413830.0, ans=0.125 2024-08-14 01:33:47,101 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 01:33:47,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2413830.0, ans=0.125 2024-08-14 01:33:48,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2413930.0, ans=0.125 2024-08-14 01:33:51,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2413930.0, ans=0.2 2024-08-14 01:34:02,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2413930.0, ans=0.0 2024-08-14 01:34:02,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2413930.0, ans=0.0 2024-08-14 01:34:05,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2414030.0, ans=0.0 2024-08-14 01:34:34,106 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 01:34:36,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9550, loss[loss=0.07995, beats_loss=0.01209, ecapa_loss=0.0001797, whisper_loss=0.06606, over 17963.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09123, over 3854082.10 frames. ], batch size: 79, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:34:48,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2414230.0, ans=0.0 2024-08-14 01:34:54,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2414330.0, ans=0.125 2024-08-14 01:35:06,919 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-14 01:35:15,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2414430.0, ans=0.2 2024-08-14 01:35:17,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.395e+01 2.666e+01 3.161e+01 6.328e+01, threshold=5.331e+01, percent-clipped=1.0 2024-08-14 01:35:25,712 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-14 01:35:26,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.586e+01 2024-08-14 01:35:27,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2414530.0, ans=0.0 2024-08-14 01:35:36,102 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 01:35:42,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2414630.0, ans=0.1 2024-08-14 01:35:57,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9600, loss[loss=0.113, beats_loss=0.00951, ecapa_loss=0.0001688, whisper_loss=0.1018, over 16138.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001607, whisper_loss=0.09119, over 3825430.59 frames. ], batch size: 62, lr: 3.67e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:35:59,840 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 01:36:03,468 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.767e+00 2024-08-14 01:36:07,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2024-08-14 01:36:27,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2414830.0, ans=0.125 2024-08-14 01:37:04,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2415030.0, ans=0.125 2024-08-14 01:37:14,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2415130.0, ans=0.0 2024-08-14 01:37:26,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9650, loss[loss=0.1008, beats_loss=0.01015, ecapa_loss=0.0001496, whisper_loss=0.08912, over 19247.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001614, whisper_loss=0.09149, over 3837565.62 frames. ], batch size: 75, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:37:44,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2415330.0, ans=0.125 2024-08-14 01:37:55,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2415330.0, ans=10.0 2024-08-14 01:38:02,007 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 01:38:09,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.345e+01 2.616e+01 2.966e+01 4.263e+01, threshold=5.231e+01, percent-clipped=0.0 2024-08-14 01:38:49,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9700, loss[loss=0.1074, beats_loss=0.0112, ecapa_loss=0.000157, whisper_loss=0.09459, over 18842.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001616, whisper_loss=0.09138, over 3868452.55 frames. ], batch size: 75, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:38:51,528 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.174e+00 2024-08-14 01:38:57,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2415730.0, ans=0.5 2024-08-14 01:39:01,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-14 01:39:06,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2415830.0, ans=0.125 2024-08-14 01:39:12,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2415830.0, ans=0.07 2024-08-14 01:39:25,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2415930.0, ans=0.125 2024-08-14 01:39:45,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2416030.0, ans=0.2 2024-08-14 01:39:52,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2416130.0, ans=0.125 2024-08-14 01:40:06,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2416130.0, ans=0.125 2024-08-14 01:40:07,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2416130.0, ans=0.125 2024-08-14 01:40:10,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9750, loss[loss=0.1051, beats_loss=0.01064, ecapa_loss=0.0001569, whisper_loss=0.09286, over 21890.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001603, whisper_loss=0.09094, over 3863936.68 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:40:36,216 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 01:40:51,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.366e+01 2.693e+01 3.078e+01 7.887e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 01:40:59,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2416530.0, ans=0.1 2024-08-14 01:41:03,203 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 01:41:11,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2416630.0, ans=0.0 2024-08-14 01:41:16,128 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 01:41:21,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2416630.0, ans=0.1 2024-08-14 01:41:26,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9800, loss[loss=0.08831, beats_loss=0.0109, ecapa_loss=0.000172, whisper_loss=0.07569, over 18709.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001598, whisper_loss=0.09094, over 3860353.99 frames. ], batch size: 77, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:41:31,322 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 01:41:45,840 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-14 01:42:07,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-08-14 01:42:37,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9850, loss[loss=0.09561, beats_loss=0.01036, ecapa_loss=0.0001616, whisper_loss=0.08363, over 18977.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.00016, whisper_loss=0.09117, over 3884237.80 frames. ], batch size: 78, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:42:42,803 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-14 01:43:03,664 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 01:43:07,835 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 01:43:10,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2417430.0, ans=0.0 2024-08-14 01:43:11,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.319e+01 2.530e+01 2.883e+01 5.906e+01, threshold=5.059e+01, percent-clipped=1.0 2024-08-14 01:43:13,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2417430.0, ans=0.125 2024-08-14 01:43:24,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2417530.0, ans=0.035 2024-08-14 01:43:30,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2417630.0, ans=0.0 2024-08-14 01:43:35,482 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-14 01:43:36,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-14 01:43:39,300 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 01:43:42,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2417630.0, ans=0.0 2024-08-14 01:43:44,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9900, loss[loss=0.1058, beats_loss=0.01013, ecapa_loss=0.0001662, whisper_loss=0.09403, over 14556.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01079, ecapa_loss=0.000159, whisper_loss=0.09146, over 3884801.34 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:43:45,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2417730.0, ans=0.0 2024-08-14 01:43:47,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2417730.0, ans=0.1 2024-08-14 01:43:50,183 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 01:43:54,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-14 01:43:55,390 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 01:44:35,372 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:44:48,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2418130.0, ans=0.2 2024-08-14 01:44:50,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2418130.0, ans=0.0 2024-08-14 01:44:52,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 9950, loss[loss=0.1179, beats_loss=0.01166, ecapa_loss=0.0001752, whisper_loss=0.1045, over 22525.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001586, whisper_loss=0.09189, over 3890610.27 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:44:56,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=22.5 2024-08-14 01:44:57,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2418230.0, ans=0.2 2024-08-14 01:45:04,564 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 01:45:26,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.412e+01 2.652e+01 3.138e+01 4.371e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-14 01:45:42,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2418530.0, ans=15.0 2024-08-14 01:45:47,531 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 01:45:54,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2418630.0, ans=0.0 2024-08-14 01:45:59,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10000, loss[loss=0.09625, beats_loss=0.0127, ecapa_loss=0.0001679, whisper_loss=0.08187, over 20966.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001601, whisper_loss=0.09212, over 3921289.53 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 1.152921504606847e+18 2024-08-14 01:45:59,963 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 01:46:06,765 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 01:46:11,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2418730.0, ans=0.1 2024-08-14 01:46:15,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-14 01:46:22,793 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 29 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 01:46:52,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2419130.0, ans=10.0 2024-08-14 01:46:54,647 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 01:47:06,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10050, loss[loss=0.105, beats_loss=0.01045, ecapa_loss=0.000189, whisper_loss=0.09264, over 21955.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001601, whisper_loss=0.09225, over 3928146.66 frames. ], batch size: 93, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:47:20,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2419330.0, ans=0.1 2024-08-14 01:47:30,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-14 01:47:42,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.384e+01 2.686e+01 2.960e+01 2.282e+02, threshold=5.371e+01, percent-clipped=3.0 2024-08-14 01:47:55,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2419530.0, ans=0.125 2024-08-14 01:48:04,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-08-14 01:48:05,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=12.0 2024-08-14 01:48:14,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10100, loss[loss=0.1062, beats_loss=0.01049, ecapa_loss=0.0001662, whisper_loss=0.09401, over 22114.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.00016, whisper_loss=0.0915, over 3914341.53 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:48:38,063 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 01:48:38,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2419830.0, ans=0.1 2024-08-14 01:48:40,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-08-14 01:49:00,269 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 01:49:01,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2024-08-14 01:49:01,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 01:49:04,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2420030.0, ans=0.04949747468305833 2024-08-14 01:49:17,244 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 01:49:20,375 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 32 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 01:49:24,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10150, loss[loss=0.09175, beats_loss=0.0114, ecapa_loss=0.0001635, whisper_loss=0.07872, over 21561.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001603, whisper_loss=0.09116, over 3929111.07 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:49:30,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2420230.0, ans=0.2 2024-08-14 01:49:40,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2420330.0, ans=0.0 2024-08-14 01:50:05,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.645e+01 2.951e+01 4.259e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-14 01:50:34,870 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 01:50:43,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10200, loss[loss=0.1044, beats_loss=0.01054, ecapa_loss=0.0001885, whisper_loss=0.09197, over 15732.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001612, whisper_loss=0.09116, over 3920450.05 frames. ], batch size: 63, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:50:46,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-14 01:50:47,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2420730.0, ans=0.2 2024-08-14 01:50:49,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2420730.0, ans=0.125 2024-08-14 01:51:04,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-14 01:51:25,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2420930.0, ans=0.025 2024-08-14 01:51:25,471 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.767e-03 2024-08-14 01:51:26,844 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 01:52:02,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2421130.0, ans=0.2 2024-08-14 01:52:08,498 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 01:52:13,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10250, loss[loss=0.118, beats_loss=0.009306, ecapa_loss=0.000157, whisper_loss=0.1071, over 21768.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001607, whisper_loss=0.09189, over 3938960.51 frames. ], batch size: 88, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:52:27,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-08-14 01:52:42,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-14 01:52:47,230 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 01:53:01,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.474e+01 2.733e+01 3.124e+01 2.948e+02, threshold=5.467e+01, percent-clipped=2.0 2024-08-14 01:53:22,526 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 01:53:27,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2421630.0, ans=0.2 2024-08-14 01:53:35,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2421630.0, ans=0.125 2024-08-14 01:53:41,340 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 01:53:42,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10300, loss[loss=0.127, beats_loss=0.008671, ecapa_loss=0.00018, whisper_loss=0.1165, over 22724.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01063, ecapa_loss=0.0001613, whisper_loss=0.09249, over 3953244.65 frames. ], batch size: 88, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:53:48,742 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 01:53:57,503 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-14 01:54:04,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2421830.0, ans=0.04949747468305833 2024-08-14 01:54:04,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-08-14 01:54:35,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2421930.0, ans=0.0 2024-08-14 01:54:44,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2422030.0, ans=0.125 2024-08-14 01:54:54,556 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 01:54:54,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2422130.0, ans=0.0 2024-08-14 01:54:59,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2422130.0, ans=0.1 2024-08-14 01:55:11,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10350, loss[loss=0.09726, beats_loss=0.01155, ecapa_loss=0.0001627, whisper_loss=0.08409, over 22184.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001605, whisper_loss=0.09228, over 3957206.87 frames. ], batch size: 93, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:55:18,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2422230.0, ans=0.0 2024-08-14 01:55:25,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2422230.0, ans=0.125 2024-08-14 01:55:25,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.51 vs. limit=10.0 2024-08-14 01:55:39,392 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 01:55:48,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-14 01:55:56,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.357e+01 2.601e+01 3.091e+01 4.779e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-14 01:55:59,802 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 01:56:04,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2422530.0, ans=0.125 2024-08-14 01:56:17,396 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 01:56:21,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2422630.0, ans=0.2 2024-08-14 01:56:32,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10400, loss[loss=0.1136, beats_loss=0.008161, ecapa_loss=0.000121, whisper_loss=0.1042, over 16077.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01068, ecapa_loss=0.000159, whisper_loss=0.09205, over 3933703.89 frames. ], batch size: 58, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:13,632 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-14 01:57:23,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2423030.0, ans=0.125 2024-08-14 01:57:41,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10450, loss[loss=0.1051, beats_loss=0.01274, ecapa_loss=0.000118, whisper_loss=0.09115, over 20866.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01073, ecapa_loss=0.0001594, whisper_loss=0.09127, over 3895351.55 frames. ], batch size: 83, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:57:44,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2423230.0, ans=0.025 2024-08-14 01:57:57,258 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 01:57:58,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-14 01:58:16,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.463e+01 2.702e+01 3.082e+01 4.541e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-14 01:58:22,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2423530.0, ans=0.0 2024-08-14 01:58:33,320 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 01:58:40,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2423630.0, ans=0.125 2024-08-14 01:58:47,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10500, loss[loss=0.1008, beats_loss=0.01276, ecapa_loss=0.0001293, whisper_loss=0.08676, over 22570.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001605, whisper_loss=0.09091, over 3876576.25 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 01:58:50,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-08-14 01:59:13,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2423930.0, ans=0.2 2024-08-14 01:59:17,883 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 01:59:33,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2424030.0, ans=0.0 2024-08-14 01:59:51,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2424230.0, ans=0.1 2024-08-14 01:59:52,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10550, loss[loss=0.1009, beats_loss=0.01158, ecapa_loss=0.0001554, whisper_loss=0.08777, over 17499.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0108, ecapa_loss=0.0001611, whisper_loss=0.09038, over 3865686.64 frames. ], batch size: 69, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:00:02,313 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2024-08-14 02:00:05,347 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-14 02:00:05,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2424330.0, ans=0.0 2024-08-14 02:00:05,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2424330.0, ans=0.0 2024-08-14 02:00:12,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-14 02:00:16,978 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 37 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 02:00:19,459 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 02:00:27,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.316e+01 2.599e+01 2.857e+01 9.329e+01, threshold=5.198e+01, percent-clipped=3.0 2024-08-14 02:00:29,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-08-14 02:00:32,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2424530.0, ans=0.2 2024-08-14 02:00:35,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2424530.0, ans=0.0 2024-08-14 02:00:46,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2424630.0, ans=0.1 2024-08-14 02:00:47,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2424630.0, ans=0.125 2024-08-14 02:00:57,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10600, loss[loss=0.09454, beats_loss=0.0117, ecapa_loss=0.000147, whisper_loss=0.08137, over 17709.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001616, whisper_loss=0.09066, over 3874642.12 frames. ], batch size: 72, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:01:02,717 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.782e-03 2024-08-14 02:01:31,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2024-08-14 02:01:40,857 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 02:01:47,360 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 02:01:48,692 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 02:01:56,580 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 02:02:00,576 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 02:02:01,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10650, loss[loss=0.1242, beats_loss=0.007884, ecapa_loss=0.0001496, whisper_loss=0.1148, over 16522.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09061, over 3873652.60 frames. ], batch size: 61, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:02:07,022 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 02:02:08,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2425230.0, ans=0.2 2024-08-14 02:02:12,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2425230.0, ans=0.125 2024-08-14 02:02:25,448 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 02:02:32,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2425430.0, ans=0.125 2024-08-14 02:02:33,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2425430.0, ans=0.1 2024-08-14 02:02:34,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2425430.0, ans=0.1 2024-08-14 02:02:37,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.364e+01 2.670e+01 2.895e+01 4.194e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-14 02:02:43,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2425530.0, ans=0.1 2024-08-14 02:02:51,543 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 19 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 02:02:53,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2425630.0, ans=0.125 2024-08-14 02:03:05,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2425630.0, ans=0.0 2024-08-14 02:03:06,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2425730.0, ans=0.125 2024-08-14 02:03:07,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10700, loss[loss=0.1002, beats_loss=0.0111, ecapa_loss=0.0001662, whisper_loss=0.08743, over 18602.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01085, ecapa_loss=0.0001591, whisper_loss=0.09028, over 3869528.13 frames. ], batch size: 76, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:03:10,013 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 02:03:12,587 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 02:03:26,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2425830.0, ans=0.2 2024-08-14 02:03:28,454 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 02:03:30,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2425830.0, ans=0.125 2024-08-14 02:03:37,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2425930.0, ans=0.0 2024-08-14 02:03:41,534 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 02:03:46,106 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.388e-01 2024-08-14 02:03:52,094 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 02:03:52,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2426030.0, ans=0.09899494936611666 2024-08-14 02:03:58,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2426030.0, ans=0.0 2024-08-14 02:03:58,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2024-08-14 02:04:06,104 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 02:04:13,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10750, loss[loss=0.07672, beats_loss=0.01215, ecapa_loss=0.000202, whisper_loss=0.06255, over 16721.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01087, ecapa_loss=0.000159, whisper_loss=0.08987, over 3843166.07 frames. ], batch size: 73, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:04:23,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2426230.0, ans=0.2 2024-08-14 02:04:23,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=12.0 2024-08-14 02:04:27,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2426330.0, ans=0.0 2024-08-14 02:04:49,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.445e+01 2.667e+01 2.966e+01 4.209e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 02:04:59,844 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 02:05:04,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2426530.0, ans=0.125 2024-08-14 02:05:06,797 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 02:05:07,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.63 vs. limit=15.0 2024-08-14 02:05:20,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10800, loss[loss=0.07069, beats_loss=0.01256, ecapa_loss=0.0001721, whisper_loss=0.05641, over 21739.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01089, ecapa_loss=0.0001593, whisper_loss=0.09034, over 3882683.76 frames. ], batch size: 92, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:05:21,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2426730.0, ans=0.125 2024-08-14 02:05:22,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2426730.0, ans=0.125 2024-08-14 02:05:31,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2426730.0, ans=0.125 2024-08-14 02:05:45,992 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 25 from Vox, 16 fro AS 2024-08-14 02:05:53,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-14 02:06:08,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-14 02:06:15,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2024-08-14 02:06:31,149 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 02:06:39,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10850, loss[loss=0.1143, beats_loss=0.01134, ecapa_loss=0.000142, whisper_loss=0.1016, over 20982.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01091, ecapa_loss=0.0001601, whisper_loss=0.08982, over 3872297.49 frames. ], batch size: 81, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:06:40,092 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 02:06:40,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2427230.0, ans=0.035 2024-08-14 02:06:58,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2427330.0, ans=0.125 2024-08-14 02:07:11,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2427430.0, ans=0.125 2024-08-14 02:07:17,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2427430.0, ans=0.5 2024-08-14 02:07:18,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2427430.0, ans=0.125 2024-08-14 02:07:22,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.381e+01 2.677e+01 3.006e+01 4.441e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-14 02:07:36,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2427530.0, ans=0.0 2024-08-14 02:07:42,168 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 02:07:46,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-14 02:07:48,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=22.5 2024-08-14 02:07:51,766 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 02:07:57,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10900, loss[loss=0.1048, beats_loss=0.01107, ecapa_loss=0.0001536, whisper_loss=0.09223, over 19228.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001596, whisper_loss=0.09104, over 3905951.64 frames. ], batch size: 76, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:08:03,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2024-08-14 02:08:10,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2427830.0, ans=0.0 2024-08-14 02:08:41,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2428030.0, ans=0.125 2024-08-14 02:08:55,185 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 02:08:59,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2428130.0, ans=0.0 2024-08-14 02:09:03,356 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 02:09:09,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 10950, loss[loss=0.1136, beats_loss=0.01073, ecapa_loss=0.0001528, whisper_loss=0.1014, over 22261.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001594, whisper_loss=0.09186, over 3914941.33 frames. ], batch size: 89, lr: 3.66e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:09:26,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2428330.0, ans=0.125 2024-08-14 02:09:27,367 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 02:09:40,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-14 02:09:46,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.387e+01 2.678e+01 3.232e+01 4.538e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-14 02:09:54,832 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.275e+01 2024-08-14 02:09:58,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2428530.0, ans=0.125 2024-08-14 02:10:02,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2428630.0, ans=0.1 2024-08-14 02:10:08,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2428630.0, ans=0.125 2024-08-14 02:10:09,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2024-08-14 02:10:14,637 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 02:10:17,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11000, loss[loss=0.09309, beats_loss=0.01222, ecapa_loss=0.0001555, whisper_loss=0.07932, over 15173.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001601, whisper_loss=0.09147, over 3898593.43 frames. ], batch size: 61, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:10:37,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2428830.0, ans=0.5 2024-08-14 02:10:40,908 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 02:10:42,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2428930.0, ans=0.0 2024-08-14 02:10:43,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-14 02:10:54,567 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 02:10:57,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2429030.0, ans=0.1 2024-08-14 02:11:23,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11050, loss[loss=0.1143, beats_loss=0.01081, ecapa_loss=0.0001415, whisper_loss=0.102, over 23950.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.0001593, whisper_loss=0.09192, over 3945023.32 frames. ], batch size: 92, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:11:29,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2429230.0, ans=0.1 2024-08-14 02:11:45,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2429330.0, ans=0.125 2024-08-14 02:11:53,416 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 02:11:58,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.348e+01 2.595e+01 2.854e+01 6.191e+01, threshold=5.189e+01, percent-clipped=1.0 2024-08-14 02:12:14,298 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 02:12:18,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2429630.0, ans=0.2 2024-08-14 02:12:28,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11100, loss[loss=0.08491, beats_loss=0.01322, ecapa_loss=0.000151, whisper_loss=0.07018, over 21755.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.000159, whisper_loss=0.09173, over 3944477.72 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:12:39,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-14 02:13:01,695 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 02:13:25,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2430130.0, ans=0.125 2024-08-14 02:13:35,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11150, loss[loss=0.12, beats_loss=0.008392, ecapa_loss=0.0001811, whisper_loss=0.1098, over 22496.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.00016, whisper_loss=0.09131, over 3896852.15 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:13:53,649 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 02:14:08,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2430430.0, ans=0.125 2024-08-14 02:14:11,430 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 02:14:12,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.319e+01 2.556e+01 2.861e+01 3.873e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-14 02:14:18,051 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 02:14:23,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2024-08-14 02:14:28,240 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 02:14:39,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2430630.0, ans=0.125 2024-08-14 02:14:39,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2430630.0, ans=0.1 2024-08-14 02:14:43,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11200, loss[loss=0.1016, beats_loss=0.01175, ecapa_loss=0.0001156, whisper_loss=0.08866, over 22790.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01069, ecapa_loss=0.0001594, whisper_loss=0.09157, over 3918953.07 frames. ], batch size: 88, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:14:52,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2430730.0, ans=0.2 2024-08-14 02:15:45,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-08-14 02:15:48,201 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 02:15:50,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11250, loss[loss=0.09368, beats_loss=0.01091, ecapa_loss=0.0001512, whisper_loss=0.08126, over 21359.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01063, ecapa_loss=0.0001597, whisper_loss=0.09185, over 3924976.34 frames. ], batch size: 88, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:15:51,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2431230.0, ans=0.1 2024-08-14 02:15:58,026 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.511e+01 2024-08-14 02:16:13,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2431330.0, ans=15.0 2024-08-14 02:16:22,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2431430.0, ans=0.1 2024-08-14 02:16:27,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.401e+01 2.755e+01 3.055e+01 4.281e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-14 02:16:28,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2431430.0, ans=0.2 2024-08-14 02:16:28,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2024-08-14 02:16:34,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2431530.0, ans=0.1 2024-08-14 02:16:48,441 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.314e+01 2024-08-14 02:16:50,678 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 02:16:57,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11300, loss[loss=0.1125, beats_loss=0.009159, ecapa_loss=0.0001889, whisper_loss=0.1014, over 21368.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01063, ecapa_loss=0.00016, whisper_loss=0.09246, over 3934897.59 frames. ], batch size: 89, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:17:00,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2431730.0, ans=0.125 2024-08-14 02:17:02,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-14 02:17:09,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2024-08-14 02:17:12,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2431830.0, ans=0.0 2024-08-14 02:17:15,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2431830.0, ans=0.125 2024-08-14 02:17:52,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2432130.0, ans=0.0 2024-08-14 02:17:52,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-14 02:17:56,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2432130.0, ans=0.125 2024-08-14 02:18:04,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11350, loss[loss=0.08799, beats_loss=0.009993, ecapa_loss=0.0002343, whisper_loss=0.07565, over 16591.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0106, ecapa_loss=0.0001597, whisper_loss=0.0928, over 3952202.20 frames. ], batch size: 72, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:18:08,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2024-08-14 02:18:22,498 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 02:18:25,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-14 02:18:32,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2432430.0, ans=0.2 2024-08-14 02:18:41,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.318e+01 2.543e+01 2.878e+01 4.882e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-14 02:18:48,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2432530.0, ans=0.0 2024-08-14 02:18:48,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2432530.0, ans=0.2 2024-08-14 02:19:07,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2432630.0, ans=0.125 2024-08-14 02:19:10,318 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-14 02:19:11,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11400, loss[loss=0.1169, beats_loss=0.00808, ecapa_loss=0.0001494, whisper_loss=0.1073, over 14875.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001598, whisper_loss=0.09231, over 3952112.80 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:19:21,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2024-08-14 02:19:43,088 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 02:19:45,331 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 02:19:57,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2433030.0, ans=0.125 2024-08-14 02:20:04,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-14 02:20:11,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2433130.0, ans=0.125 2024-08-14 02:20:17,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-14 02:20:18,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11450, loss[loss=0.1019, beats_loss=0.01251, ecapa_loss=0.000157, whisper_loss=0.08782, over 16060.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01063, ecapa_loss=0.0001606, whisper_loss=0.09236, over 3925373.86 frames. ], batch size: 67, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:20:21,329 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 02:20:29,740 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 02:20:47,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2433430.0, ans=0.125 2024-08-14 02:20:54,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2433430.0, ans=0.015 2024-08-14 02:20:56,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.11 vs. limit=6.0 2024-08-14 02:20:56,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.469e+01 2.659e+01 2.977e+01 4.724e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 02:21:02,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2433530.0, ans=0.1 2024-08-14 02:21:17,333 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 02:21:18,484 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 02:21:27,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11500, loss[loss=0.08912, beats_loss=0.01071, ecapa_loss=0.000135, whisper_loss=0.07706, over 16951.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001598, whisper_loss=0.09164, over 3920943.25 frames. ], batch size: 63, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:21:32,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2433730.0, ans=10.0 2024-08-14 02:21:54,410 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 02:21:56,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2433930.0, ans=0.0 2024-08-14 02:22:16,207 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:22:17,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-14 02:22:21,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2434130.0, ans=0.0 2024-08-14 02:22:28,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-14 02:22:34,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11550, loss[loss=0.1114, beats_loss=0.01175, ecapa_loss=0.0001806, whisper_loss=0.09782, over 22232.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001596, whisper_loss=0.09142, over 3929795.11 frames. ], batch size: 93, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:22:34,233 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 02:22:40,655 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 02:23:10,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.439e+01 2.716e+01 3.080e+01 3.847e+02, threshold=5.432e+01, percent-clipped=2.0 2024-08-14 02:23:20,364 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 02:23:29,538 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 02:23:33,670 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 02:23:41,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11600, loss[loss=0.09235, beats_loss=0.01075, ecapa_loss=0.0001706, whisper_loss=0.07989, over 14834.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001596, whisper_loss=0.09121, over 3912907.29 frames. ], batch size: 60, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:23:43,854 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 02:23:47,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-14 02:24:19,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2434930.0, ans=0.04949747468305833 2024-08-14 02:24:25,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2435030.0, ans=0.0 2024-08-14 02:24:29,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2435030.0, ans=0.125 2024-08-14 02:24:35,809 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 19 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 02:24:52,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11650, loss[loss=0.09131, beats_loss=0.01141, ecapa_loss=0.0001732, whisper_loss=0.07817, over 15223.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01084, ecapa_loss=0.0001595, whisper_loss=0.09056, over 3908500.81 frames. ], batch size: 63, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:24:54,185 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 02:25:00,208 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:25:00,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2435230.0, ans=0.0 2024-08-14 02:25:01,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2435230.0, ans=0.1 2024-08-14 02:25:06,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-08-14 02:25:32,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.452e+01 2.685e+01 3.038e+01 4.511e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-14 02:25:36,389 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 02:25:40,800 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 02:25:58,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2435630.0, ans=15.0 2024-08-14 02:26:05,757 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 02:26:06,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11700, loss[loss=0.1174, beats_loss=0.009705, ecapa_loss=0.0001531, whisper_loss=0.1062, over 20604.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.0001591, whisper_loss=0.09093, over 3932489.61 frames. ], batch size: 80, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:26:12,773 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 02:26:30,430 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 02:26:31,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2435830.0, ans=0.0 2024-08-14 02:27:20,436 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 02:27:23,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2436230.0, ans=0.0 2024-08-14 02:27:23,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11750, loss[loss=0.1185, beats_loss=0.009073, ecapa_loss=0.0001283, whisper_loss=0.1082, over 18399.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01088, ecapa_loss=0.0001592, whisper_loss=0.09112, over 3918592.25 frames. ], batch size: 67, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:27:44,350 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 02:27:55,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-14 02:28:01,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2436430.0, ans=0.125 2024-08-14 02:28:03,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.416e+01 2.659e+01 3.015e+01 1.752e+02, threshold=5.317e+01, percent-clipped=1.0 2024-08-14 02:28:07,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2436530.0, ans=0.1 2024-08-14 02:28:34,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-14 02:28:37,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11800, loss[loss=0.1056, beats_loss=0.01119, ecapa_loss=0.0001479, whisper_loss=0.09292, over 16768.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001595, whisper_loss=0.09118, over 3912297.73 frames. ], batch size: 67, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:28:42,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2436730.0, ans=0.125 2024-08-14 02:28:46,015 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 02:28:51,356 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 02:28:58,022 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 02:29:07,006 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 02:29:09,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2436930.0, ans=0.125 2024-08-14 02:29:38,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2437130.0, ans=0.1 2024-08-14 02:29:39,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2437130.0, ans=0.125 2024-08-14 02:29:41,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-08-14 02:29:43,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2437130.0, ans=0.125 2024-08-14 02:29:46,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11850, loss[loss=0.09696, beats_loss=0.01147, ecapa_loss=0.0001627, whisper_loss=0.08386, over 14025.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.0001597, whisper_loss=0.09134, over 3908794.45 frames. ], batch size: 55, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:29:46,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2437230.0, ans=0.0 2024-08-14 02:29:56,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2437230.0, ans=0.0 2024-08-14 02:30:18,257 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 02:30:22,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2437430.0, ans=0.125 2024-08-14 02:30:23,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.460e+01 2.783e+01 3.243e+01 6.982e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-14 02:30:32,005 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 02:30:45,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-08-14 02:30:46,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2437630.0, ans=0.2 2024-08-14 02:30:49,019 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 02:30:49,413 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.200e+00 2024-08-14 02:30:55,395 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11900, loss[loss=0.08819, beats_loss=0.01203, ecapa_loss=0.0001739, whisper_loss=0.07442, over 15567.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001598, whisper_loss=0.09127, over 3898468.89 frames. ], batch size: 64, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:30:58,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2437730.0, ans=0.0 2024-08-14 02:31:30,431 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 02:31:43,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2438030.0, ans=0.0 2024-08-14 02:31:54,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2438130.0, ans=0.1 2024-08-14 02:32:03,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 11950, loss[loss=0.09337, beats_loss=0.01166, ecapa_loss=0.0001596, whisper_loss=0.08012, over 16688.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001593, whisper_loss=0.09095, over 3867213.36 frames. ], batch size: 70, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:32:13,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-08-14 02:32:41,289 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 02:32:42,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.634e+01 2.951e+01 4.369e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-14 02:32:47,486 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 02:33:15,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12000, loss[loss=0.08657, beats_loss=0.01313, ecapa_loss=0.0001697, whisper_loss=0.07174, over 20911.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001589, whisper_loss=0.09094, over 3844465.06 frames. ], batch size: 92, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:33:15,486 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 02:34:00,845 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005541, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 02:34:21,443 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on SV_voxceleb1: loss=0.004448, beats_loss=0, ecapa_loss=0.0004448, whisper_loss=0, over 939242.00 frames. 2024-08-14 02:36:27,299 INFO [train_multi_KD3.py:1149] (3/4) Epoch 17, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 02:36:27,303 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 02:36:47,285 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 02:37:19,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2439030.0, ans=0.0 2024-08-14 02:37:20,679 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 02:37:22,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-14 02:37:25,409 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 02:37:33,542 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 15 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 02:37:36,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12050, loss[loss=0.1027, beats_loss=0.009531, ecapa_loss=0.0001738, whisper_loss=0.09144, over 20989.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01086, ecapa_loss=0.0001594, whisper_loss=0.09, over 3855473.08 frames. ], batch size: 87, lr: 3.65e-03, grad_scale: 1.152921504606847e+18 2024-08-14 02:37:51,375 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 02:38:14,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.389e+01 2.572e+01 2.864e+01 7.729e+01, threshold=5.144e+01, percent-clipped=2.0 2024-08-14 02:38:22,282 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-14 02:38:26,269 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 02:38:44,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12100, loss[loss=0.09677, beats_loss=0.009852, ecapa_loss=0.000146, whisper_loss=0.08545, over 19507.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01082, ecapa_loss=0.0001589, whisper_loss=0.09046, over 3844366.61 frames. ], batch size: 75, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:38:46,831 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 02:39:07,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=12.0 2024-08-14 02:39:09,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2439830.0, ans=0.125 2024-08-14 02:39:32,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2024-08-14 02:39:32,856 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 02:39:37,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2024-08-14 02:39:53,861 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 02:40:01,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12150, loss[loss=0.1007, beats_loss=0.01116, ecapa_loss=0.0001532, whisper_loss=0.08806, over 22786.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01083, ecapa_loss=0.0001582, whisper_loss=0.09037, over 3855258.45 frames. ], batch size: 94, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:40:07,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2440230.0, ans=0.0 2024-08-14 02:40:26,875 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 02:40:32,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-14 02:40:44,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.448e+01 2.795e+01 3.138e+01 2.484e+02, threshold=5.590e+01, percent-clipped=2.0 2024-08-14 02:40:54,045 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 02:41:04,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-14 02:41:10,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.58 vs. limit=22.5 2024-08-14 02:41:15,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2440630.0, ans=0.125 2024-08-14 02:41:18,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12200, loss[loss=0.08715, beats_loss=0.01436, ecapa_loss=0.000147, whisper_loss=0.07131, over 17373.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01084, ecapa_loss=0.0001583, whisper_loss=0.09003, over 3828800.06 frames. ], batch size: 73, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:41:34,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-08-14 02:41:35,325 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 02:41:37,151 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 30 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 02:41:50,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2440930.0, ans=0.125 2024-08-14 02:41:56,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2440930.0, ans=0.2 2024-08-14 02:41:57,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2440930.0, ans=10.0 2024-08-14 02:42:05,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2441030.0, ans=0.125 2024-08-14 02:42:14,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2441030.0, ans=0.2 2024-08-14 02:42:33,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12250, loss[loss=0.1078, beats_loss=0.0112, ecapa_loss=0.0001731, whisper_loss=0.09488, over 22470.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001585, whisper_loss=0.09022, over 3834303.23 frames. ], batch size: 93, lr: 3.65e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:42:36,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=15.0 2024-08-14 02:42:45,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2441230.0, ans=0.1 2024-08-14 02:42:50,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2441330.0, ans=0.125 2024-08-14 02:42:53,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-14 02:43:01,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-08-14 02:43:14,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.529e+01 2.845e+01 3.228e+01 1.360e+02, threshold=5.691e+01, percent-clipped=2.0 2024-08-14 02:43:22,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2441530.0, ans=0.125 2024-08-14 02:43:37,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-14 02:43:46,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12300, loss[loss=0.1178, beats_loss=0.00859, ecapa_loss=0.0001952, whisper_loss=0.1072, over 18822.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001593, whisper_loss=0.09021, over 3851374.14 frames. ], batch size: 78, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:43:54,078 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 02:43:58,300 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 11 from Vox, 48 fro AS 2024-08-14 02:44:04,158 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 02:44:21,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2441930.0, ans=0.1 2024-08-14 02:44:29,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-08-14 02:44:32,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2442030.0, ans=0.0 2024-08-14 02:44:32,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2024-08-14 02:44:33,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2024-08-14 02:44:37,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-14 02:44:45,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2442130.0, ans=0.2 2024-08-14 02:44:52,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-14 02:44:56,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12350, loss[loss=0.1302, beats_loss=0.008206, ecapa_loss=0.0001568, whisper_loss=0.1204, over 18361.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001604, whisper_loss=0.0915, over 3867625.41 frames. ], batch size: 69, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:44:58,437 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 02:45:01,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-14 02:45:23,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2442430.0, ans=0.125 2024-08-14 02:45:34,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.343e+01 2.707e+01 2.893e+01 7.539e+01, threshold=5.413e+01, percent-clipped=2.0 2024-08-14 02:45:36,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2442530.0, ans=0.0 2024-08-14 02:45:39,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2442530.0, ans=0.035 2024-08-14 02:45:49,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2442630.0, ans=0.125 2024-08-14 02:45:56,770 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 02:45:58,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2442630.0, ans=0.0 2024-08-14 02:46:03,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12400, loss[loss=0.1078, beats_loss=0.009278, ecapa_loss=0.0001603, whisper_loss=0.09693, over 23145.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.00016, whisper_loss=0.09148, over 3864237.66 frames. ], batch size: 92, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:46:12,664 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 35 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 02:46:40,161 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 02:46:44,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2443030.0, ans=0.2 2024-08-14 02:46:44,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2443030.0, ans=0.125 2024-08-14 02:46:47,987 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 02:46:48,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2443030.0, ans=0.5 2024-08-14 02:46:54,805 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 02:47:07,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12450, loss[loss=0.1206, beats_loss=0.01001, ecapa_loss=0.0001495, whisper_loss=0.1091, over 23309.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001597, whisper_loss=0.09159, over 3910180.53 frames. ], batch size: 90, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:47:10,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2443230.0, ans=0.125 2024-08-14 02:47:15,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2024-08-14 02:47:17,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2443230.0, ans=0.125 2024-08-14 02:47:26,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2443330.0, ans=0.0 2024-08-14 02:47:36,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2443430.0, ans=0.0 2024-08-14 02:47:44,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.385e+01 2.629e+01 3.074e+01 4.896e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-14 02:47:44,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2443430.0, ans=0.1 2024-08-14 02:47:48,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2443530.0, ans=0.1 2024-08-14 02:47:51,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2443530.0, ans=0.0 2024-08-14 02:47:53,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2443530.0, ans=0.125 2024-08-14 02:48:12,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12500, loss[loss=0.1082, beats_loss=0.01103, ecapa_loss=0.0001469, whisper_loss=0.09569, over 22465.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01056, ecapa_loss=0.0001602, whisper_loss=0.09157, over 3859936.87 frames. ], batch size: 91, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:48:18,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-14 02:48:36,212 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 02:48:49,050 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 02:48:53,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2444030.0, ans=0.125 2024-08-14 02:48:58,180 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 02:49:11,347 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 02:49:13,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2444130.0, ans=0.125 2024-08-14 02:49:17,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12550, loss[loss=0.1277, beats_loss=0.009773, ecapa_loss=0.0001412, whisper_loss=0.1165, over 15750.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01063, ecapa_loss=0.0001601, whisper_loss=0.09158, over 3886863.20 frames. ], batch size: 59, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:49:33,680 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 02:49:41,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2444330.0, ans=0.1 2024-08-14 02:49:51,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2444430.0, ans=15.0 2024-08-14 02:49:54,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.347e+01 2.679e+01 3.056e+01 5.301e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-14 02:50:06,420 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 02:50:22,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12600, loss[loss=0.1092, beats_loss=0.01236, ecapa_loss=0.0001418, whisper_loss=0.09541, over 17615.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0106, ecapa_loss=0.0001597, whisper_loss=0.09216, over 3863934.76 frames. ], batch size: 70, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:50:24,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2444730.0, ans=0.0 2024-08-14 02:50:29,146 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 02:50:35,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2444830.0, ans=0.0 2024-08-14 02:50:39,203 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 02:50:39,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-14 02:50:58,729 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 02:51:10,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2445030.0, ans=0.2 2024-08-14 02:51:27,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12650, loss[loss=0.1142, beats_loss=0.009366, ecapa_loss=0.0001514, whisper_loss=0.1034, over 14219.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001602, whisper_loss=0.09127, over 3862085.97 frames. ], batch size: 54, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:51:27,448 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 02:51:50,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-14 02:52:03,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.374e+01 2.633e+01 2.976e+01 1.427e+02, threshold=5.265e+01, percent-clipped=1.0 2024-08-14 02:52:08,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2445530.0, ans=0.2 2024-08-14 02:52:12,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2445530.0, ans=0.125 2024-08-14 02:52:18,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2445630.0, ans=0.2 2024-08-14 02:52:30,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2445630.0, ans=0.2 2024-08-14 02:52:32,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12700, loss[loss=0.1045, beats_loss=0.01124, ecapa_loss=0.0001564, whisper_loss=0.09166, over 17606.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001594, whisper_loss=0.09129, over 3855640.60 frames. ], batch size: 70, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:52:42,772 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 02:52:48,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2445830.0, ans=0.125 2024-08-14 02:53:00,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2445930.0, ans=0.0 2024-08-14 02:53:12,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2446030.0, ans=0.0 2024-08-14 02:53:12,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-14 02:53:16,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-08-14 02:53:23,612 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 02:53:28,788 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 02:53:31,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2446130.0, ans=0.125 2024-08-14 02:53:32,710 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 02:53:32,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2446130.0, ans=0.0 2024-08-14 02:53:37,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12750, loss[loss=0.0957, beats_loss=0.01135, ecapa_loss=0.0001627, whisper_loss=0.08273, over 21114.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001589, whisper_loss=0.09084, over 3868897.20 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:53:39,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-14 02:53:47,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2446230.0, ans=0.0 2024-08-14 02:53:58,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2446330.0, ans=0.125 2024-08-14 02:54:05,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2446430.0, ans=0.125 2024-08-14 02:54:14,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.458e+01 2.855e+01 3.170e+01 1.362e+02, threshold=5.709e+01, percent-clipped=3.0 2024-08-14 02:54:21,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-14 02:54:32,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2446630.0, ans=0.05 2024-08-14 02:54:34,799 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 02:54:42,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12800, loss[loss=0.1147, beats_loss=0.009174, ecapa_loss=0.0001605, whisper_loss=0.1039, over 17359.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001606, whisper_loss=0.0909, over 3865200.30 frames. ], batch size: 70, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:54:48,968 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 13 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 02:55:08,974 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 02:55:09,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2446930.0, ans=0.125 2024-08-14 02:55:13,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2024-08-14 02:55:14,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2446930.0, ans=0.125 2024-08-14 02:55:17,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2446930.0, ans=0.2 2024-08-14 02:55:19,027 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 02:55:31,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.10 vs. limit=22.5 2024-08-14 02:55:40,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2447130.0, ans=0.125 2024-08-14 02:55:45,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2447130.0, ans=0.0 2024-08-14 02:55:46,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2447230.0, ans=0.125 2024-08-14 02:55:47,788 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12850, loss[loss=0.1129, beats_loss=0.007943, ecapa_loss=0.0001603, whisper_loss=0.1034, over 15765.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.0001593, whisper_loss=0.09075, over 3892307.35 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:55:57,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2447230.0, ans=0.0 2024-08-14 02:56:02,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2447330.0, ans=15.0 2024-08-14 02:56:04,921 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 02:56:09,851 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 18 from Vox, 56 fro AS 2024-08-14 02:56:23,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.51 vs. limit=22.5 2024-08-14 02:56:23,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.741e+01 3.118e+01 1.301e+02, threshold=5.482e+01, percent-clipped=1.0 2024-08-14 02:56:33,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2447530.0, ans=0.0 2024-08-14 02:56:48,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2447630.0, ans=0.0 2024-08-14 02:56:53,059 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12900, loss[loss=0.09624, beats_loss=0.01098, ecapa_loss=0.0001613, whisper_loss=0.08365, over 22189.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001613, whisper_loss=0.0902, over 3872089.60 frames. ], batch size: 88, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:56:58,107 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-14 02:57:03,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2024-08-14 02:57:14,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2447830.0, ans=0.125 2024-08-14 02:58:01,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 12950, loss[loss=0.09621, beats_loss=0.01075, ecapa_loss=0.0001228, whisper_loss=0.08424, over 16326.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01087, ecapa_loss=0.0001616, whisper_loss=0.08961, over 3862772.12 frames. ], batch size: 61, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:58:12,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2448230.0, ans=0.0 2024-08-14 02:58:18,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2448330.0, ans=0.125 2024-08-14 02:58:32,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.57 vs. limit=22.5 2024-08-14 02:58:36,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-14 02:58:41,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.282e+01 2.587e+01 2.877e+01 4.043e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 02:58:50,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2024-08-14 02:58:52,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2448530.0, ans=0.1 2024-08-14 02:58:54,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2448530.0, ans=0.125 2024-08-14 02:59:12,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13000, loss[loss=0.1018, beats_loss=0.00953, ecapa_loss=0.000163, whisper_loss=0.09061, over 15052.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.0001613, whisper_loss=0.09101, over 3887863.73 frames. ], batch size: 58, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 02:59:18,052 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 02:59:19,506 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 02:59:19,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2448730.0, ans=0.0 2024-08-14 02:59:21,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2448730.0, ans=0.125 2024-08-14 02:59:34,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-14 02:59:37,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2448830.0, ans=0.0 2024-08-14 02:59:38,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2448830.0, ans=10.0 2024-08-14 03:00:04,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2449030.0, ans=10.0 2024-08-14 03:00:05,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2024-08-14 03:00:17,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2024-08-14 03:00:27,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13050, loss[loss=0.1034, beats_loss=0.01217, ecapa_loss=0.0001241, whisper_loss=0.08994, over 19682.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01082, ecapa_loss=0.0001612, whisper_loss=0.09068, over 3886064.70 frames. ], batch size: 75, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:00:28,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2449230.0, ans=0.125 2024-08-14 03:00:32,949 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-14 03:00:33,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-08-14 03:00:56,041 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:00:57,373 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 03:00:57,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2449330.0, ans=0.125 2024-08-14 03:01:01,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2449430.0, ans=0.0 2024-08-14 03:01:14,449 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 03:01:16,913 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:01:18,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.546e+01 2.785e+01 3.142e+01 1.124e+02, threshold=5.570e+01, percent-clipped=2.0 2024-08-14 03:01:31,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2449530.0, ans=0.95 2024-08-14 03:02:03,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13100, loss[loss=0.09582, beats_loss=0.0128, ecapa_loss=0.0001207, whisper_loss=0.08181, over 18693.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001596, whisper_loss=0.09122, over 3876842.46 frames. ], batch size: 72, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:02:45,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2449930.0, ans=0.1 2024-08-14 03:02:54,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2449930.0, ans=0.0 2024-08-14 03:03:33,232 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 03:03:35,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2450130.0, ans=0.2 2024-08-14 03:03:53,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13150, loss[loss=0.114, beats_loss=0.01, ecapa_loss=0.0001827, whisper_loss=0.1022, over 22067.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001603, whisper_loss=0.09106, over 3878682.85 frames. ], batch size: 91, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:04:04,005 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 03:04:13,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2450230.0, ans=0.1 2024-08-14 03:04:27,352 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 03:04:52,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2024-08-14 03:05:06,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2450430.0, ans=0.2 2024-08-14 03:05:09,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.343e+01 2.636e+01 2.918e+01 3.888e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 03:05:14,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2450530.0, ans=0.1 2024-08-14 03:05:20,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2450530.0, ans=0.125 2024-08-14 03:05:30,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2450530.0, ans=0.125 2024-08-14 03:06:01,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2450630.0, ans=0.125 2024-08-14 03:06:08,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13200, loss[loss=0.1082, beats_loss=0.01036, ecapa_loss=0.0001892, whisper_loss=0.09596, over 19490.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001599, whisper_loss=0.09163, over 3888385.03 frames. ], batch size: 80, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:06:10,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2450730.0, ans=0.2 2024-08-14 03:06:14,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2024-08-14 03:06:24,222 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 03:06:24,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2450730.0, ans=0.125 2024-08-14 03:06:37,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2450830.0, ans=0.0 2024-08-14 03:06:45,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-14 03:07:35,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2451030.0, ans=0.0 2024-08-14 03:07:42,661 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 03:07:47,905 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 03:07:52,284 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 03:07:57,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2451130.0, ans=0.125 2024-08-14 03:08:16,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13250, loss[loss=0.1236, beats_loss=0.009705, ecapa_loss=0.0001375, whisper_loss=0.1125, over 24781.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0107, ecapa_loss=0.0001599, whisper_loss=0.09172, over 3897722.14 frames. ], batch size: 95, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:08:39,753 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 03:08:56,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2451330.0, ans=0.1 2024-08-14 03:09:12,375 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 03:09:14,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2451430.0, ans=0.0 2024-08-14 03:09:21,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2451430.0, ans=0.025 2024-08-14 03:09:27,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.488e+01 2.774e+01 3.161e+01 6.895e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 03:10:01,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13300, loss[loss=0.1033, beats_loss=0.01233, ecapa_loss=0.0001851, whisper_loss=0.08911, over 20910.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001608, whisper_loss=0.09152, over 3868438.06 frames. ], batch size: 87, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:10:02,324 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 03:10:37,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2451930.0, ans=0.1 2024-08-14 03:10:40,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2451930.0, ans=0.125 2024-08-14 03:10:49,808 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 03:10:52,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-08-14 03:10:55,615 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 03:11:13,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-08-14 03:11:19,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2452130.0, ans=0.1 2024-08-14 03:11:25,756 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 03:11:26,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13350, loss[loss=0.1023, beats_loss=0.01121, ecapa_loss=0.0001781, whisper_loss=0.08928, over 16130.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001606, whisper_loss=0.09155, over 3883716.24 frames. ], batch size: 67, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:11:27,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2452230.0, ans=0.0 2024-08-14 03:11:42,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2452330.0, ans=0.09899494936611666 2024-08-14 03:11:48,396 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 03:12:12,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.377e+01 2.695e+01 3.024e+01 3.722e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-14 03:12:30,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2452630.0, ans=0.1 2024-08-14 03:12:46,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2452630.0, ans=0.2 2024-08-14 03:12:46,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2452630.0, ans=0.0 2024-08-14 03:12:47,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2452730.0, ans=0.04949747468305833 2024-08-14 03:12:48,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13400, loss[loss=0.09513, beats_loss=0.01067, ecapa_loss=0.0001837, whisper_loss=0.08262, over 20066.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001607, whisper_loss=0.09104, over 3870329.94 frames. ], batch size: 83, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:12:56,925 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 03:13:03,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2452830.0, ans=0.125 2024-08-14 03:13:05,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2452830.0, ans=0.125 2024-08-14 03:13:10,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2452830.0, ans=0.0 2024-08-14 03:13:12,944 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:13:19,609 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 03:13:24,206 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 03:13:25,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-14 03:13:39,137 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06988123059272766, model_norm_threshold=53.90802001953125 2024-08-14 03:13:39,318 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.512e+05, grad_sumsq=1.512e+05, orig_rms_sq=1.000e+00 2024-08-14 03:13:43,122 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 16 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-14 03:13:53,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2453130.0, ans=0.125 2024-08-14 03:14:09,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13450, loss[loss=0.1119, beats_loss=0.01165, ecapa_loss=0.0001529, whisper_loss=0.09876, over 16484.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001618, whisper_loss=0.09147, over 3888805.49 frames. ], batch size: 65, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:14:09,834 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.645e+00 2024-08-14 03:14:14,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2453230.0, ans=0.0 2024-08-14 03:14:33,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2453330.0, ans=0.125 2024-08-14 03:14:52,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2453430.0, ans=0.125 2024-08-14 03:14:55,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.489e+01 2.722e+01 3.204e+01 7.714e+02, threshold=5.444e+01, percent-clipped=1.0 2024-08-14 03:15:20,612 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 03:15:27,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13500, loss[loss=0.09092, beats_loss=0.009004, ecapa_loss=0.0001823, whisper_loss=0.08009, over 15116.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.000162, whisper_loss=0.09086, over 3900287.29 frames. ], batch size: 60, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:15:55,053 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 03:15:56,705 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 03:16:08,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2454030.0, ans=0.125 2024-08-14 03:16:14,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2454030.0, ans=0.125 2024-08-14 03:16:15,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2454030.0, ans=0.125 2024-08-14 03:16:24,580 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 32 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 03:16:36,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13550, loss[loss=0.108, beats_loss=0.009896, ecapa_loss=0.0001611, whisper_loss=0.09645, over 17408.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001615, whisper_loss=0.09029, over 3872699.68 frames. ], batch size: 68, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:16:40,743 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 03:16:46,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2454230.0, ans=0.2 2024-08-14 03:16:57,362 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 03:17:12,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.332e+01 2.621e+01 2.776e+01 5.086e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-14 03:17:15,425 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 03:17:28,053 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 03:17:40,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2454730.0, ans=0.1 2024-08-14 03:17:41,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13600, loss[loss=0.1075, beats_loss=0.01012, ecapa_loss=0.0001583, whisper_loss=0.09577, over 15716.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001588, whisper_loss=0.09031, over 3845102.30 frames. ], batch size: 63, lr: 3.64e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:17:45,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-14 03:17:47,928 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 03:17:56,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2454830.0, ans=0.2 2024-08-14 03:17:57,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2454830.0, ans=0.0 2024-08-14 03:18:03,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2454830.0, ans=0.0 2024-08-14 03:18:03,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2454830.0, ans=0.0 2024-08-14 03:18:12,775 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 03:18:16,892 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 03:18:18,199 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 03:18:20,959 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 03:18:26,120 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 03:18:46,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13650, loss[loss=0.1087, beats_loss=0.00895, ecapa_loss=0.000189, whisper_loss=0.09786, over 13322.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01093, ecapa_loss=0.0001583, whisper_loss=0.08928, over 3855569.95 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:18:58,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2455230.0, ans=0.125 2024-08-14 03:19:24,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.360e+01 2.649e+01 3.081e+01 1.605e+02, threshold=5.298e+01, percent-clipped=1.0 2024-08-14 03:19:37,838 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 03:19:57,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13700, loss[loss=0.1101, beats_loss=0.01005, ecapa_loss=0.0001611, whisper_loss=0.09843, over 14226.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01086, ecapa_loss=0.0001588, whisper_loss=0.08978, over 3845640.40 frames. ], batch size: 57, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:20:00,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2455730.0, ans=0.0 2024-08-14 03:20:08,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2455730.0, ans=0.0 2024-08-14 03:20:15,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-14 03:20:21,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2455830.0, ans=0.125 2024-08-14 03:20:23,454 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 03:20:25,164 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-14 03:20:27,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2455930.0, ans=0.5 2024-08-14 03:20:32,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2455930.0, ans=0.0 2024-08-14 03:20:37,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-14 03:20:43,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2456030.0, ans=0.2 2024-08-14 03:20:45,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2456030.0, ans=0.0 2024-08-14 03:20:57,924 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 03:21:10,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13750, loss[loss=0.09866, beats_loss=0.01276, ecapa_loss=0.0001217, whisper_loss=0.08468, over 14880.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01089, ecapa_loss=0.0001577, whisper_loss=0.08965, over 3854239.33 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:21:12,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-14 03:21:28,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2456330.0, ans=0.05 2024-08-14 03:21:30,584 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 03:21:51,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2456430.0, ans=0.2 2024-08-14 03:21:54,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.289e+01 2.530e+01 2.894e+01 7.886e+01, threshold=5.061e+01, percent-clipped=1.0 2024-08-14 03:22:00,282 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-14 03:22:06,161 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 03:22:13,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2456630.0, ans=0.125 2024-08-14 03:22:14,795 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 03:22:16,108 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-14 03:22:28,423 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13800, loss[loss=0.1322, beats_loss=0.008523, ecapa_loss=0.0001967, whisper_loss=0.1217, over 13814.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01079, ecapa_loss=0.0001587, whisper_loss=0.0907, over 3852166.90 frames. ], batch size: 54, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:22:36,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2456730.0, ans=0.125 2024-08-14 03:23:03,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-14 03:23:10,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2456930.0, ans=0.125 2024-08-14 03:23:16,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2457030.0, ans=0.125 2024-08-14 03:23:48,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13850, loss[loss=0.1121, beats_loss=0.0074, ecapa_loss=0.0002, whisper_loss=0.1027, over 17894.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.0001589, whisper_loss=0.09061, over 3843173.74 frames. ], batch size: 72, lr: 3.63e-03, grad_scale: 5.764607523034235e+17 2024-08-14 03:24:00,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2457230.0, ans=0.125 2024-08-14 03:24:06,816 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 03:24:09,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2457330.0, ans=0.1 2024-08-14 03:24:27,478 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 03:24:27,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2457430.0, ans=0.1 2024-08-14 03:24:35,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.485e+01 2.798e+01 3.130e+01 4.713e+02, threshold=5.595e+01, percent-clipped=2.0 2024-08-14 03:24:36,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2457530.0, ans=0.125 2024-08-14 03:24:38,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2024-08-14 03:24:41,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2457530.0, ans=0.2 2024-08-14 03:25:08,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2457630.0, ans=0.125 2024-08-14 03:25:10,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2457730.0, ans=0.125 2024-08-14 03:25:11,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13900, loss[loss=0.1048, beats_loss=0.01133, ecapa_loss=0.0001632, whisper_loss=0.09183, over 19912.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001597, whisper_loss=0.09152, over 3844091.77 frames. ], batch size: 83, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:25:14,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-14 03:25:24,644 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 03:25:25,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2024-08-14 03:25:26,592 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 03:25:45,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-08-14 03:25:58,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2457930.0, ans=15.0 2024-08-14 03:26:28,453 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 03:26:34,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 13950, loss[loss=0.1036, beats_loss=0.009245, ecapa_loss=0.0001603, whisper_loss=0.09274, over 16330.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001599, whisper_loss=0.09163, over 3851879.70 frames. ], batch size: 63, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:26:44,072 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 03:26:52,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2458330.0, ans=0.125 2024-08-14 03:27:10,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2458430.0, ans=0.0 2024-08-14 03:27:19,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.317e+01 2.641e+01 2.864e+01 9.900e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-14 03:27:23,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2458530.0, ans=0.1 2024-08-14 03:27:37,573 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 03:27:51,770 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 03:27:52,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14000, loss[loss=0.0884, beats_loss=0.01063, ecapa_loss=0.0001506, whisper_loss=0.07626, over 21244.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01076, ecapa_loss=0.0001589, whisper_loss=0.09165, over 3859153.33 frames. ], batch size: 86, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:28:03,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2458730.0, ans=0.2 2024-08-14 03:28:19,927 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 03:28:38,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2458930.0, ans=15.0 2024-08-14 03:28:40,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2459030.0, ans=0.0 2024-08-14 03:28:59,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2459130.0, ans=0.125 2024-08-14 03:29:06,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2459130.0, ans=0.1 2024-08-14 03:29:09,676 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 03:29:11,137 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 03:29:12,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14050, loss[loss=0.1134, beats_loss=0.009667, ecapa_loss=0.0002156, whisper_loss=0.1016, over 18869.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01078, ecapa_loss=0.0001586, whisper_loss=0.09232, over 3885203.67 frames. ], batch size: 77, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:29:16,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2459230.0, ans=0.125 2024-08-14 03:29:38,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2459330.0, ans=0.125 2024-08-14 03:29:41,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2459330.0, ans=0.125 2024-08-14 03:29:58,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.432e+01 2.589e+01 2.887e+01 3.706e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 03:30:00,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-14 03:30:10,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-14 03:30:24,362 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 03:30:25,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2024-08-14 03:30:27,666 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 03:30:29,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2459630.0, ans=0.125 2024-08-14 03:30:31,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14100, loss[loss=0.1126, beats_loss=0.0101, ecapa_loss=0.0001458, whisper_loss=0.101, over 22917.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01083, ecapa_loss=0.0001586, whisper_loss=0.0918, over 3912328.49 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:30:40,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2459730.0, ans=0.125 2024-08-14 03:31:00,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2459830.0, ans=0.125 2024-08-14 03:31:09,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-14 03:31:13,642 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 03:31:22,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2460030.0, ans=0.1 2024-08-14 03:31:26,411 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 03:31:33,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2460030.0, ans=0.04949747468305833 2024-08-14 03:31:37,927 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-14 03:31:41,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=12.0 2024-08-14 03:31:45,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2460130.0, ans=0.0 2024-08-14 03:31:52,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14150, loss[loss=0.0865, beats_loss=0.01083, ecapa_loss=0.0001889, whisper_loss=0.07378, over 14396.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01095, ecapa_loss=0.0001573, whisper_loss=0.09082, over 3903824.30 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:31:54,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2460230.0, ans=0.125 2024-08-14 03:32:10,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-14 03:32:14,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-14 03:32:19,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2460330.0, ans=0.2 2024-08-14 03:32:24,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2460430.0, ans=0.0 2024-08-14 03:32:28,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2460430.0, ans=0.0 2024-08-14 03:32:39,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2460430.0, ans=0.2 2024-08-14 03:32:40,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.331e+01 2.560e+01 2.927e+01 4.747e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 03:32:44,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.22 vs. limit=22.5 2024-08-14 03:33:01,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2460630.0, ans=0.0 2024-08-14 03:33:16,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14200, loss[loss=0.1, beats_loss=0.009099, ecapa_loss=0.0002074, whisper_loss=0.08883, over 20489.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.0001576, whisper_loss=0.0906, over 3915478.97 frames. ], batch size: 86, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:33:21,873 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 03:33:35,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2460830.0, ans=0.0 2024-08-14 03:33:44,767 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 03:33:52,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2460930.0, ans=0.0 2024-08-14 03:33:53,085 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 03:34:01,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2460930.0, ans=10.0 2024-08-14 03:34:07,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2461030.0, ans=0.125 2024-08-14 03:34:13,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2461030.0, ans=0.2 2024-08-14 03:34:40,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14250, loss[loss=0.1114, beats_loss=0.01247, ecapa_loss=0.0001781, whisper_loss=0.09719, over 15273.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001573, whisper_loss=0.09054, over 3952045.41 frames. ], batch size: 63, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:34:42,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2024-08-14 03:34:52,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2461230.0, ans=0.5 2024-08-14 03:34:57,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2461330.0, ans=0.125 2024-08-14 03:35:02,227 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 03:35:04,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-14 03:35:06,647 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 03:35:10,328 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 03:35:23,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2461430.0, ans=0.0 2024-08-14 03:35:25,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2461430.0, ans=0.125 2024-08-14 03:35:25,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.288e+01 2.518e+01 2.897e+01 5.060e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-14 03:35:26,289 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 03:35:29,062 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 03:35:31,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2461530.0, ans=0.1 2024-08-14 03:35:59,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14300, loss[loss=0.09773, beats_loss=0.01229, ecapa_loss=0.000152, whisper_loss=0.08392, over 21826.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01093, ecapa_loss=0.0001572, whisper_loss=0.08971, over 3938272.81 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:36:01,978 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 03:36:09,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2461730.0, ans=0.5 2024-08-14 03:36:31,343 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 03:36:34,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2461930.0, ans=0.1 2024-08-14 03:36:36,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2461930.0, ans=0.125 2024-08-14 03:36:36,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2461930.0, ans=0.1 2024-08-14 03:36:37,572 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 03:36:44,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2461930.0, ans=0.0 2024-08-14 03:36:46,268 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 03:36:53,373 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 03:37:09,178 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 03:37:15,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2462130.0, ans=0.0 2024-08-14 03:37:18,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14350, loss[loss=0.0827, beats_loss=0.0118, ecapa_loss=0.0002027, whisper_loss=0.06888, over 15096.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001572, whisper_loss=0.09009, over 3940653.19 frames. ], batch size: 66, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:37:31,830 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 03:37:32,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2462230.0, ans=0.125 2024-08-14 03:38:01,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2462430.0, ans=0.125 2024-08-14 03:38:03,863 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.518e+01 2.731e+01 3.066e+01 7.073e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-14 03:38:04,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2462530.0, ans=0.125 2024-08-14 03:38:04,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-14 03:38:10,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-14 03:38:16,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462530.0, ans=0.1 2024-08-14 03:38:18,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2462530.0, ans=0.09899494936611666 2024-08-14 03:38:20,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2462630.0, ans=0.125 2024-08-14 03:38:20,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2462630.0, ans=0.1 2024-08-14 03:38:24,457 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 03:38:31,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2462630.0, ans=0.0 2024-08-14 03:38:36,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14400, loss[loss=0.09251, beats_loss=0.01228, ecapa_loss=0.0001435, whisper_loss=0.07879, over 22500.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01081, ecapa_loss=0.0001577, whisper_loss=0.09024, over 3925135.23 frames. ], batch size: 92, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:38:44,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2462730.0, ans=0.125 2024-08-14 03:38:49,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2462730.0, ans=0.1 2024-08-14 03:39:12,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2462930.0, ans=0.125 2024-08-14 03:39:15,570 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 26 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-14 03:39:21,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2462930.0, ans=0.0 2024-08-14 03:39:26,547 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 03:39:32,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2463030.0, ans=0.0 2024-08-14 03:39:43,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2463130.0, ans=0.125 2024-08-14 03:39:44,649 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 03:39:53,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 17, batch 14450, loss[loss=0.1089, beats_loss=0.009185, ecapa_loss=0.0001669, whisper_loss=0.09809, over 17527.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01079, ecapa_loss=0.0001595, whisper_loss=0.09026, over 3897752.82 frames. ], batch size: 66, lr: 3.63e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:39:53,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2463230.0, ans=0.125 2024-08-14 03:39:57,474 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 03:40:10,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2463330.0, ans=0.1 2024-08-14 03:40:21,431 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 03:40:40,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.412e+01 2.642e+01 2.928e+01 4.301e+01, threshold=5.284e+01, percent-clipped=0.0 2024-08-14 03:40:43,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2463530.0, ans=0.0 2024-08-14 03:41:52,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 0, loss[loss=0.09589, beats_loss=0.0117, ecapa_loss=0.0001701, whisper_loss=0.08249, over 17829.00 frames. ], tot_loss[loss=0.09589, beats_loss=0.0117, ecapa_loss=0.0001701, whisper_loss=0.08249, over 17829.00 frames. ], batch size: 72, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:41:52,896 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 03:42:32,759 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005528, whisper_loss=0.2483, over 922467.00 frames. 2024-08-14 03:42:48,647 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on SV_voxceleb1: loss=0.004396, beats_loss=0, ecapa_loss=0.0004396, whisper_loss=0, over 939242.00 frames. 2024-08-14 03:44:37,220 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 03:44:37,223 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 03:44:42,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-14 03:44:42,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-14 03:44:47,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.84 vs. limit=10.0 2024-08-14 03:44:59,230 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 03:45:36,564 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 03:46:30,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-08-14 03:46:32,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2024-08-14 03:46:38,348 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 03:46:40,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 50, loss[loss=0.08563, beats_loss=0.01017, ecapa_loss=0.0001603, whisper_loss=0.07386, over 15452.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01011, ecapa_loss=0.0001619, whisper_loss=0.08939, over 904059.05 frames. ], batch size: 61, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:46:53,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2464220.0, ans=0.1 2024-08-14 03:47:28,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2464420.0, ans=0.125 2024-08-14 03:47:36,769 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 03:47:47,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.625e+01 2.934e+01 3.274e+01 1.725e+02, threshold=5.869e+01, percent-clipped=1.0 2024-08-14 03:47:47,748 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 03:47:52,137 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 03:47:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2464520.0, ans=0.125 2024-08-14 03:48:27,392 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 03:48:31,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 100, loss[loss=0.1044, beats_loss=0.008977, ecapa_loss=0.0001474, whisper_loss=0.09396, over 19738.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.009765, ecapa_loss=0.0001607, whisper_loss=0.09112, over 1543693.54 frames. ], batch size: 77, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:48:43,049 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 03:48:54,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2464820.0, ans=0.0 2024-08-14 03:48:58,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2464820.0, ans=0.125 2024-08-14 03:49:07,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2464820.0, ans=0.2 2024-08-14 03:49:14,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2464920.0, ans=0.125 2024-08-14 03:49:21,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2464920.0, ans=0.0 2024-08-14 03:49:54,801 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 03:50:14,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 150, loss[loss=0.07636, beats_loss=0.01056, ecapa_loss=0.0001354, whisper_loss=0.06444, over 16979.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.009745, ecapa_loss=0.0001602, whisper_loss=0.09177, over 2060375.63 frames. ], batch size: 65, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:50:16,334 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 03:50:20,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2465220.0, ans=0.1 2024-08-14 03:50:30,077 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 03:50:38,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2465320.0, ans=0.125 2024-08-14 03:50:44,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2465320.0, ans=0.125 2024-08-14 03:50:46,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2465420.0, ans=0.0 2024-08-14 03:50:54,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2465420.0, ans=0.125 2024-08-14 03:51:01,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2465520.0, ans=0.125 2024-08-14 03:51:02,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.708e+01 3.001e+01 3.363e+01 1.526e+02, threshold=6.002e+01, percent-clipped=2.0 2024-08-14 03:51:04,359 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 03:51:05,973 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 03:51:06,957 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:51:11,684 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 03:51:27,371 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-14 03:51:33,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 200, loss[loss=0.1028, beats_loss=0.01104, ecapa_loss=0.0001585, whisper_loss=0.09017, over 17360.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.009997, ecapa_loss=0.0001615, whisper_loss=0.09167, over 2464092.79 frames. ], batch size: 68, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:51:37,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2465720.0, ans=0.125 2024-08-14 03:51:59,121 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 03:52:15,856 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 03:52:30,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-08-14 03:52:31,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2466020.0, ans=0.0 2024-08-14 03:52:40,529 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 03:52:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2466120.0, ans=0.0 2024-08-14 03:52:55,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 250, loss[loss=0.09799, beats_loss=0.01055, ecapa_loss=0.000203, whisper_loss=0.08541, over 21001.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01026, ecapa_loss=0.0001606, whisper_loss=0.09096, over 2798941.18 frames. ], batch size: 91, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:52:59,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2466220.0, ans=0.125 2024-08-14 03:53:09,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2466320.0, ans=0.125 2024-08-14 03:53:10,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-14 03:53:16,142 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-14 03:53:22,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2466320.0, ans=0.0 2024-08-14 03:53:39,953 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 03:53:42,759 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 03:53:46,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.439e+01 2.692e+01 3.141e+01 8.859e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-14 03:53:55,055 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 03:54:12,377 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 03:54:19,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 300, loss[loss=0.1049, beats_loss=0.01121, ecapa_loss=0.0001379, whisper_loss=0.09232, over 18101.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.000159, whisper_loss=0.09069, over 3028969.89 frames. ], batch size: 68, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:54:34,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2466720.0, ans=0.125 2024-08-14 03:54:39,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-14 03:55:07,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2466920.0, ans=10.0 2024-08-14 03:55:08,106 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 03:55:35,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-08-14 03:55:39,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 350, loss[loss=0.108, beats_loss=0.008791, ecapa_loss=0.00016, whisper_loss=0.09765, over 15792.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001578, whisper_loss=0.09019, over 3182222.54 frames. ], batch size: 59, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:55:39,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-14 03:56:16,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-14 03:56:18,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2467420.0, ans=0.1 2024-08-14 03:56:25,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.373e+01 2.538e+01 2.756e+01 1.193e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-14 03:56:32,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2467520.0, ans=0.1 2024-08-14 03:56:39,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2467620.0, ans=0.125 2024-08-14 03:56:42,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2467620.0, ans=0.125 2024-08-14 03:56:43,609 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 03:56:46,563 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 03:56:55,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 400, loss[loss=0.09016, beats_loss=0.01385, ecapa_loss=0.0001524, whisper_loss=0.07479, over 18975.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001572, whisper_loss=0.08992, over 3321731.93 frames. ], batch size: 76, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:56:57,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2467720.0, ans=0.125 2024-08-14 03:57:00,061 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 03:57:16,246 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 03:57:28,890 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 03:57:32,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2467920.0, ans=15.0 2024-08-14 03:57:41,459 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 03:57:44,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.21 vs. limit=22.5 2024-08-14 03:58:11,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 450, loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.0001832, whisper_loss=0.09104, over 19074.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001571, whisper_loss=0.09012, over 3454736.88 frames. ], batch size: 77, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:58:22,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2468220.0, ans=0.05 2024-08-14 03:58:33,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 03:58:47,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2024-08-14 03:58:52,907 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 03:58:54,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2468420.0, ans=0.125 2024-08-14 03:58:56,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2468520.0, ans=0.1 2024-08-14 03:58:57,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.249e+01 2.491e+01 2.829e+01 3.988e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-14 03:58:58,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2468520.0, ans=0.0 2024-08-14 03:59:04,968 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-14 03:59:12,214 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 03:59:23,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2468620.0, ans=0.0 2024-08-14 03:59:28,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 500, loss[loss=0.1014, beats_loss=0.0118, ecapa_loss=0.0001554, whisper_loss=0.08806, over 20147.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01053, ecapa_loss=0.0001571, whisper_loss=0.09042, over 3539260.81 frames. ], batch size: 79, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 03:59:30,952 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 03:59:41,149 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-14 03:59:49,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2468820.0, ans=0.125 2024-08-14 03:59:55,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=12.0 2024-08-14 04:00:04,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.07 vs. limit=15.0 2024-08-14 04:00:08,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-08-14 04:00:14,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2469020.0, ans=0.09899494936611666 2024-08-14 04:00:21,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2469020.0, ans=0.125 2024-08-14 04:00:45,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2469220.0, ans=0.0 2024-08-14 04:00:45,702 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 550, loss[loss=0.07686, beats_loss=0.0121, ecapa_loss=0.0001688, whisper_loss=0.06308, over 20672.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001564, whisper_loss=0.08971, over 3593611.89 frames. ], batch size: 87, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:01:19,326 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 04:01:25,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2469420.0, ans=0.125 2024-08-14 04:01:27,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-14 04:01:31,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2469520.0, ans=0.2 2024-08-14 04:01:32,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.764e+01 3.145e+01 1.301e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-14 04:01:37,395 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:01:48,899 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-14 04:01:56,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2469620.0, ans=0.0 2024-08-14 04:02:01,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 600, loss[loss=0.127, beats_loss=0.01048, ecapa_loss=0.0001488, whisper_loss=0.115, over 23445.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001552, whisper_loss=0.09059, over 3650765.86 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:02:06,540 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 04:02:12,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2469720.0, ans=0.125 2024-08-14 04:02:21,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.15 vs. limit=22.5 2024-08-14 04:02:28,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2469820.0, ans=0.035 2024-08-14 04:02:41,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2469920.0, ans=0.0 2024-08-14 04:03:01,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2470120.0, ans=0.1 2024-08-14 04:03:13,367 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-14 04:03:14,790 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 04:03:15,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 650, loss[loss=0.09167, beats_loss=0.01308, ecapa_loss=0.0002032, whisper_loss=0.07657, over 15328.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001558, whisper_loss=0.08997, over 3712259.48 frames. ], batch size: 64, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:03:16,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2470220.0, ans=0.125 2024-08-14 04:03:23,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2470220.0, ans=0.0 2024-08-14 04:03:25,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2470220.0, ans=0.0 2024-08-14 04:03:25,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-14 04:03:35,630 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 04:03:38,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2024-08-14 04:03:54,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2470420.0, ans=0.1 2024-08-14 04:04:00,910 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 04:04:02,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.415e+01 2.559e+01 3.017e+01 4.730e+01, threshold=5.119e+01, percent-clipped=1.0 2024-08-14 04:04:04,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-14 04:04:19,555 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:04:21,924 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-14 04:04:32,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 700, loss[loss=0.09164, beats_loss=0.0116, ecapa_loss=0.0001604, whisper_loss=0.07843, over 19620.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001568, whisper_loss=0.0901, over 3720741.88 frames. ], batch size: 80, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:04:46,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2470820.0, ans=0.1 2024-08-14 04:04:49,341 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 04:04:51,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2470820.0, ans=0.125 2024-08-14 04:04:51,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-14 04:04:53,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2470820.0, ans=0.07 2024-08-14 04:05:15,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-08-14 04:05:22,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2471020.0, ans=0.2 2024-08-14 04:05:23,669 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-14 04:05:35,865 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 04:05:47,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 750, loss[loss=0.09956, beats_loss=0.01008, ecapa_loss=0.0001787, whisper_loss=0.08769, over 21895.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001568, whisper_loss=0.09039, over 3770593.78 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:05:58,104 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 04:06:07,212 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 04:06:19,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2471420.0, ans=0.125 2024-08-14 04:06:26,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2471420.0, ans=0.125 2024-08-14 04:06:29,088 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-14 04:06:31,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.490e+01 2.820e+01 4.318e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-14 04:06:52,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2471620.0, ans=0.125 2024-08-14 04:06:53,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2471620.0, ans=0.0 2024-08-14 04:07:01,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 800, loss[loss=0.1138, beats_loss=0.008772, ecapa_loss=0.0001663, whisper_loss=0.1034, over 20460.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001558, whisper_loss=0.09024, over 3783175.56 frames. ], batch size: 84, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:07:10,561 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 04:07:15,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2471720.0, ans=0.125 2024-08-14 04:07:35,747 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 04:08:05,895 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 04:08:07,478 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 23 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-14 04:08:17,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 850, loss[loss=0.09272, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.08093, over 17979.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001558, whisper_loss=0.08994, over 3796597.82 frames. ], batch size: 68, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:08:18,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2472220.0, ans=0.0 2024-08-14 04:08:47,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2472420.0, ans=0.0 2024-08-14 04:08:47,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2472420.0, ans=0.125 2024-08-14 04:08:50,045 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 04:08:50,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2472420.0, ans=0.04949747468305833 2024-08-14 04:09:01,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.448e+01 2.671e+01 3.055e+01 4.887e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-14 04:09:09,439 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 04:09:16,162 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 04:09:30,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-14 04:09:33,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 900, loss[loss=0.1006, beats_loss=0.0115, ecapa_loss=0.0001647, whisper_loss=0.08745, over 19307.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001549, whisper_loss=0.09038, over 3805244.21 frames. ], batch size: 77, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:09:36,817 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:09:55,516 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 04:10:16,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2472920.0, ans=0.125 2024-08-14 04:10:24,657 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 04:10:27,890 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 04:10:29,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2473020.0, ans=0.0 2024-08-14 04:10:30,907 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 04:10:50,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 950, loss[loss=0.1178, beats_loss=0.009891, ecapa_loss=0.0001382, whisper_loss=0.1065, over 23681.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001548, whisper_loss=0.09032, over 3824043.85 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:10:57,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2473220.0, ans=0.0 2024-08-14 04:11:17,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2473320.0, ans=0.125 2024-08-14 04:11:25,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2473420.0, ans=0.5 2024-08-14 04:11:35,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.279e+01 2.588e+01 3.016e+01 4.728e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 04:12:04,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1000, loss[loss=0.09104, beats_loss=0.01405, ecapa_loss=0.0001141, whisper_loss=0.07584, over 22706.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001535, whisper_loss=0.08938, over 3808891.77 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:12:08,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-14 04:12:09,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2473720.0, ans=0.125 2024-08-14 04:12:21,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-08-14 04:12:32,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2473820.0, ans=0.125 2024-08-14 04:12:42,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-14 04:12:48,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2473920.0, ans=0.0 2024-08-14 04:12:50,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2474020.0, ans=0.0 2024-08-14 04:13:21,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1050, loss[loss=0.1056, beats_loss=0.01034, ecapa_loss=0.0001993, whisper_loss=0.0933, over 18269.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001541, whisper_loss=0.08917, over 3827771.77 frames. ], batch size: 78, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:13:35,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2474220.0, ans=0.2 2024-08-14 04:13:37,806 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 04:13:43,806 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 04:13:44,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=12.0 2024-08-14 04:13:50,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2474320.0, ans=0.125 2024-08-14 04:13:54,676 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 04:13:58,958 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 04:14:05,516 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 04:14:08,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.402e+01 2.807e+01 3.075e+01 7.896e+01, threshold=5.614e+01, percent-clipped=1.0 2024-08-14 04:14:08,464 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 04:14:16,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-08-14 04:14:38,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1100, loss[loss=0.0768, beats_loss=0.01143, ecapa_loss=0.0001426, whisper_loss=0.06395, over 14942.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001532, whisper_loss=0.08939, over 3810729.68 frames. ], batch size: 59, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:14:47,498 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 04:14:50,366 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 04:14:58,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2024-08-14 04:14:59,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2474820.0, ans=0.0 2024-08-14 04:15:06,652 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 04:15:11,048 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 04:15:28,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2475020.0, ans=0.125 2024-08-14 04:15:34,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2475020.0, ans=0.0 2024-08-14 04:15:50,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2475120.0, ans=0.2 2024-08-14 04:15:52,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1150, loss[loss=0.1197, beats_loss=0.008175, ecapa_loss=0.0001588, whisper_loss=0.11, over 16259.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000152, whisper_loss=0.09015, over 3817304.86 frames. ], batch size: 59, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:15:59,047 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 04:16:00,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2475220.0, ans=0.2 2024-08-14 04:16:13,985 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 04:16:22,945 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 04:16:34,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-14 04:16:38,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.338e+01 2.593e+01 2.937e+01 5.602e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 04:16:41,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2475520.0, ans=0.125 2024-08-14 04:17:01,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2475620.0, ans=0.1 2024-08-14 04:17:07,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1200, loss[loss=0.1012, beats_loss=0.01018, ecapa_loss=0.0001512, whisper_loss=0.08953, over 22078.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01068, ecapa_loss=0.0001529, whisper_loss=0.08954, over 3807451.23 frames. ], batch size: 87, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:17:15,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2475720.0, ans=0.125 2024-08-14 04:17:46,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=12.0 2024-08-14 04:17:51,201 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-14 04:17:51,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2476020.0, ans=0.2 2024-08-14 04:18:05,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2024-08-14 04:18:20,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476220.0, ans=0.1 2024-08-14 04:18:21,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1250, loss[loss=0.1121, beats_loss=0.0108, ecapa_loss=0.0001178, whisper_loss=0.1002, over 22725.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001532, whisper_loss=0.08955, over 3797243.11 frames. ], batch size: 84, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:18:30,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2476220.0, ans=0.0 2024-08-14 04:18:36,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2476320.0, ans=0.125 2024-08-14 04:18:40,330 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 04:18:40,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2476320.0, ans=0.0 2024-08-14 04:18:43,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-14 04:18:46,219 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 04:18:59,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2476420.0, ans=0.2 2024-08-14 04:19:07,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.365e+01 2.557e+01 2.889e+01 4.348e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 04:19:11,953 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 04:19:29,955 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-14 04:19:30,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2476620.0, ans=0.125 2024-08-14 04:19:33,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2476620.0, ans=0.1 2024-08-14 04:19:38,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1300, loss[loss=0.1018, beats_loss=0.007787, ecapa_loss=0.0001935, whisper_loss=0.09206, over 17135.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.0001541, whisper_loss=0.08924, over 3801117.45 frames. ], batch size: 65, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:19:58,505 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 04:20:09,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2476920.0, ans=0.2 2024-08-14 04:20:15,487 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 04:20:23,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2477020.0, ans=0.2 2024-08-14 04:20:24,694 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 04:20:29,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2477020.0, ans=0.0 2024-08-14 04:20:40,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2477120.0, ans=0.125 2024-08-14 04:20:50,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.68 vs. limit=15.0 2024-08-14 04:20:52,930 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 04:20:53,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2477120.0, ans=10.0 2024-08-14 04:20:55,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1350, loss[loss=0.09684, beats_loss=0.0122, ecapa_loss=0.0001163, whisper_loss=0.08348, over 22635.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001536, whisper_loss=0.08963, over 3807198.05 frames. ], batch size: 87, lr: 3.52e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 04:21:02,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2477220.0, ans=0.0 2024-08-14 04:21:13,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2477320.0, ans=0.125 2024-08-14 04:21:32,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2477420.0, ans=0.125 2024-08-14 04:21:41,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.297e+01 2.516e+01 2.764e+01 4.025e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 04:21:41,719 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 04:21:43,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2477520.0, ans=0.125 2024-08-14 04:22:03,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-08-14 04:22:11,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1400, loss[loss=0.0988, beats_loss=0.01306, ecapa_loss=0.000148, whisper_loss=0.08426, over 18940.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001528, whisper_loss=0.08956, over 3825676.39 frames. ], batch size: 77, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:22:16,534 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 04:22:27,967 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 04:22:54,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2477920.0, ans=0.125 2024-08-14 04:23:01,869 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 04:23:12,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2478120.0, ans=0.125 2024-08-14 04:23:12,797 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.449e-01 2024-08-14 04:24:06,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1450, loss[loss=0.1134, beats_loss=0.009208, ecapa_loss=0.0001408, whisper_loss=0.1028, over 22940.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001527, whisper_loss=0.08943, over 3822717.06 frames. ], batch size: 88, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:24:08,586 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-14 04:24:13,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2478220.0, ans=0.05 2024-08-14 04:24:39,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2478420.0, ans=0.2 2024-08-14 04:24:55,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.310e+01 2.554e+01 2.920e+01 4.164e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-14 04:25:29,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1500, loss[loss=0.0934, beats_loss=0.0102, ecapa_loss=0.0001442, whisper_loss=0.08175, over 17401.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01072, ecapa_loss=0.0001534, whisper_loss=0.08895, over 3826039.20 frames. ], batch size: 69, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:25:32,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-14 04:25:46,114 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 04:25:48,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2478820.0, ans=0.125 2024-08-14 04:26:07,848 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-14 04:26:21,619 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 04:26:26,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2479020.0, ans=0.0 2024-08-14 04:26:39,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2479120.0, ans=0.125 2024-08-14 04:26:42,511 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 04:26:50,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1550, loss[loss=0.1008, beats_loss=0.01176, ecapa_loss=0.0001156, whisper_loss=0.08792, over 18787.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01074, ecapa_loss=0.0001526, whisper_loss=0.08884, over 3819560.15 frames. ], batch size: 70, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:26:50,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2479220.0, ans=0.125 2024-08-14 04:27:00,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-14 04:27:01,151 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 04:27:20,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2479320.0, ans=0.125 2024-08-14 04:27:24,981 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 04:27:31,258 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 10 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 04:27:38,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2479520.0, ans=0.0 2024-08-14 04:27:39,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.208e+01 2.513e+01 2.710e+01 4.785e+01, threshold=5.026e+01, percent-clipped=0.0 2024-08-14 04:28:08,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2479620.0, ans=0.1 2024-08-14 04:28:10,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2479720.0, ans=0.125 2024-08-14 04:28:11,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1600, loss[loss=0.1021, beats_loss=0.01298, ecapa_loss=0.0001061, whisper_loss=0.08806, over 22802.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01077, ecapa_loss=0.0001533, whisper_loss=0.08893, over 3831929.40 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:28:12,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-14 04:28:21,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2479720.0, ans=0.0 2024-08-14 04:28:36,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2479820.0, ans=0.125 2024-08-14 04:28:38,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.30 vs. limit=22.5 2024-08-14 04:28:47,626 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 04:29:11,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2480020.0, ans=0.1 2024-08-14 04:29:15,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2480120.0, ans=0.125 2024-08-14 04:29:19,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2480120.0, ans=0.0 2024-08-14 04:29:23,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2480120.0, ans=0.025 2024-08-14 04:29:27,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2480120.0, ans=0.125 2024-08-14 04:29:31,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1650, loss[loss=0.1057, beats_loss=0.0118, ecapa_loss=0.0001653, whisper_loss=0.09224, over 20696.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001535, whisper_loss=0.0897, over 3827067.77 frames. ], batch size: 82, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:29:36,683 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 8 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 04:29:46,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2480320.0, ans=0.125 2024-08-14 04:29:50,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2480320.0, ans=0.2 2024-08-14 04:30:12,280 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 04:30:13,940 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 04:30:14,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2480420.0, ans=0.125 2024-08-14 04:30:17,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.355e+01 2.575e+01 2.902e+01 4.492e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 04:30:17,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2480520.0, ans=0.0 2024-08-14 04:30:22,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2480520.0, ans=0.05 2024-08-14 04:30:31,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2480620.0, ans=0.0 2024-08-14 04:30:32,422 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 04:30:41,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2480620.0, ans=0.125 2024-08-14 04:30:46,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1700, loss[loss=0.08337, beats_loss=0.01169, ecapa_loss=0.0001534, whisper_loss=0.07015, over 18477.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.09004, over 3815481.99 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:30:52,227 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 04:30:54,959 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 04:31:08,463 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 04:31:40,271 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 04:32:00,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1750, loss[loss=0.1036, beats_loss=0.01251, ecapa_loss=0.0001646, whisper_loss=0.08947, over 23169.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001526, whisper_loss=0.09107, over 3837512.07 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:32:04,952 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 04:32:08,073 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 04:32:19,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2024-08-14 04:32:29,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2481420.0, ans=0.125 2024-08-14 04:32:34,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2481420.0, ans=0.125 2024-08-14 04:32:38,763 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-14 04:32:44,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.328e+01 2.583e+01 3.000e+01 1.080e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-14 04:32:45,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-14 04:32:52,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2024-08-14 04:33:06,410 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 13 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 04:33:13,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1800, loss[loss=0.07283, beats_loss=0.01395, ecapa_loss=0.0001387, whisper_loss=0.05749, over 19433.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.09059, over 3840749.96 frames. ], batch size: 79, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:33:21,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2481720.0, ans=0.0 2024-08-14 04:33:29,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2481820.0, ans=0.0 2024-08-14 04:33:33,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2481820.0, ans=0.5 2024-08-14 04:33:36,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=8.0 2024-08-14 04:33:37,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2481820.0, ans=0.0 2024-08-14 04:33:50,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2481920.0, ans=0.1 2024-08-14 04:33:56,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2482020.0, ans=0.125 2024-08-14 04:33:59,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2482020.0, ans=0.125 2024-08-14 04:34:27,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1850, loss[loss=0.1058, beats_loss=0.008732, ecapa_loss=0.0002237, whisper_loss=0.09487, over 19544.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001552, whisper_loss=0.09037, over 3833437.92 frames. ], batch size: 78, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:34:33,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2482220.0, ans=0.04949747468305833 2024-08-14 04:34:36,775 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 04:34:45,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2482320.0, ans=0.2 2024-08-14 04:34:48,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2482320.0, ans=0.1 2024-08-14 04:35:13,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.339e+01 2.610e+01 2.958e+01 9.834e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-14 04:35:35,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2482620.0, ans=0.125 2024-08-14 04:35:39,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2482620.0, ans=0.125 2024-08-14 04:35:44,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1900, loss[loss=0.1061, beats_loss=0.007094, ecapa_loss=0.0001835, whisper_loss=0.09715, over 14908.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001541, whisper_loss=0.08952, over 3812244.67 frames. ], batch size: 58, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:35:49,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2482720.0, ans=0.125 2024-08-14 04:35:56,917 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 04:36:37,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2483020.0, ans=0.125 2024-08-14 04:36:40,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2483020.0, ans=0.1 2024-08-14 04:36:40,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2483020.0, ans=0.125 2024-08-14 04:36:49,292 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 04:36:53,471 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 04:36:56,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-14 04:37:01,336 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 1950, loss[loss=0.1169, beats_loss=0.009846, ecapa_loss=0.0001446, whisper_loss=0.1056, over 23298.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001533, whisper_loss=0.08983, over 3829761.73 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:37:09,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2483220.0, ans=0.125 2024-08-14 04:37:20,490 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 04:37:46,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.351e+01 2.542e+01 2.768e+01 3.987e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 04:37:51,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2483520.0, ans=0.2 2024-08-14 04:38:11,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-14 04:38:16,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2000, loss[loss=0.1162, beats_loss=0.008715, ecapa_loss=0.0001951, whisper_loss=0.1056, over 21728.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01067, ecapa_loss=0.0001531, whisper_loss=0.08917, over 3831575.97 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:38:30,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2483720.0, ans=0.125 2024-08-14 04:38:40,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2483820.0, ans=0.05 2024-08-14 04:38:46,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2483820.0, ans=0.125 2024-08-14 04:38:56,002 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.463e-03 2024-08-14 04:39:19,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-14 04:39:37,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2050, loss[loss=0.08901, beats_loss=0.01159, ecapa_loss=0.0001289, whisper_loss=0.07613, over 16156.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01069, ecapa_loss=0.0001532, whisper_loss=0.08825, over 3827068.95 frames. ], batch size: 62, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:39:49,246 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 12 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 04:40:19,378 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 04:40:25,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 2.326e+01 2.679e+01 3.072e+01 5.038e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-14 04:40:40,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2484620.0, ans=0.1 2024-08-14 04:40:55,976 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 04:40:57,147 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2100, loss[loss=0.09323, beats_loss=0.01223, ecapa_loss=0.0001942, whisper_loss=0.07906, over 22015.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01076, ecapa_loss=0.0001521, whisper_loss=0.08843, over 3858681.04 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:41:00,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-14 04:41:10,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2484820.0, ans=0.125 2024-08-14 04:41:28,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2484920.0, ans=0.0 2024-08-14 04:41:37,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2024-08-14 04:41:39,967 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 04:41:59,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-14 04:42:08,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2485120.0, ans=0.125 2024-08-14 04:42:09,857 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 04:42:15,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2150, loss[loss=0.1247, beats_loss=0.01129, ecapa_loss=0.0001274, whisper_loss=0.1121, over 16353.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01077, ecapa_loss=0.0001529, whisper_loss=0.08891, over 3849062.48 frames. ], batch size: 64, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:42:25,435 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-14 04:42:27,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2485220.0, ans=0.0 2024-08-14 04:42:48,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2485420.0, ans=0.125 2024-08-14 04:42:51,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2485420.0, ans=0.125 2024-08-14 04:42:59,015 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 04:43:04,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.306e+01 2.493e+01 2.947e+01 5.632e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-14 04:43:25,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-14 04:43:35,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2200, loss[loss=0.08649, beats_loss=0.01083, ecapa_loss=0.0001466, whisper_loss=0.0742, over 15445.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01085, ecapa_loss=0.0001535, whisper_loss=0.08916, over 3867254.72 frames. ], batch size: 60, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:43:48,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-14 04:43:57,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2485820.0, ans=0.0 2024-08-14 04:43:59,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2485820.0, ans=0.0 2024-08-14 04:44:13,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-08-14 04:44:31,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-08-14 04:44:32,431 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-14 04:44:52,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2486120.0, ans=0.125 2024-08-14 04:44:54,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2250, loss[loss=0.09842, beats_loss=0.008687, ecapa_loss=0.0001924, whisper_loss=0.08781, over 14735.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01086, ecapa_loss=0.0001551, whisper_loss=0.08968, over 3876780.03 frames. ], batch size: 58, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:44:56,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2486220.0, ans=0.1 2024-08-14 04:45:01,520 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 04:45:13,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2486320.0, ans=0.125 2024-08-14 04:45:42,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.436e+01 2.743e+01 3.250e+01 7.629e+01, threshold=5.485e+01, percent-clipped=1.0 2024-08-14 04:45:42,793 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 04:46:08,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2486620.0, ans=0.2 2024-08-14 04:46:15,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2300, loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001179, whisper_loss=0.09116, over 17570.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01086, ecapa_loss=0.0001557, whisper_loss=0.09, over 3866730.29 frames. ], batch size: 66, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:46:21,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-14 04:46:30,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2486820.0, ans=0.1 2024-08-14 04:46:33,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2486820.0, ans=0.125 2024-08-14 04:46:35,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=22.5 2024-08-14 04:46:39,120 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 04:47:21,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2487120.0, ans=0.125 2024-08-14 04:47:28,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-14 04:47:29,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2487120.0, ans=0.125 2024-08-14 04:47:34,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2350, loss[loss=0.1419, beats_loss=0.00759, ecapa_loss=0.0001518, whisper_loss=0.1328, over 24955.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001553, whisper_loss=0.09042, over 3849999.08 frames. ], batch size: 92, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:47:39,735 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 04:47:57,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2487320.0, ans=0.0 2024-08-14 04:48:02,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2487320.0, ans=0.125 2024-08-14 04:48:05,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2487420.0, ans=0.125 2024-08-14 04:48:14,775 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-14 04:48:16,193 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 04:48:22,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.359e+01 2.626e+01 3.027e+01 4.535e+02, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 04:48:28,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-14 04:48:31,240 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 04:48:55,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2400, loss[loss=0.09692, beats_loss=0.01125, ecapa_loss=0.0001513, whisper_loss=0.08415, over 21619.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001564, whisper_loss=0.09083, over 3866008.06 frames. ], batch size: 90, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:48:56,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2487720.0, ans=0.2 2024-08-14 04:48:57,030 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-14 04:49:10,676 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.913e+05 2024-08-14 04:49:28,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2487920.0, ans=0.0 2024-08-14 04:49:32,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2487920.0, ans=0.0 2024-08-14 04:49:51,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2488020.0, ans=0.0 2024-08-14 04:49:55,149 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 04:50:04,536 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 04:50:14,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2450, loss[loss=0.1003, beats_loss=0.01092, ecapa_loss=0.0001574, whisper_loss=0.08778, over 17599.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001572, whisper_loss=0.09166, over 3881056.91 frames. ], batch size: 70, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:50:23,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2488220.0, ans=0.0 2024-08-14 04:50:36,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2488320.0, ans=0.035 2024-08-14 04:50:40,612 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 31 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 04:50:52,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-08-14 04:50:58,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2488420.0, ans=0.125 2024-08-14 04:51:00,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.556e+01 2.864e+01 5.420e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-14 04:51:04,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2488520.0, ans=0.125 2024-08-14 04:51:22,552 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 04:51:29,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2488620.0, ans=0.0 2024-08-14 04:51:29,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2488620.0, ans=0.1 2024-08-14 04:51:32,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2500, loss[loss=0.1058, beats_loss=0.008804, ecapa_loss=0.0001574, whisper_loss=0.09538, over 17900.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01066, ecapa_loss=0.0001563, whisper_loss=0.09153, over 3913581.63 frames. ], batch size: 70, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:51:43,971 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 04:51:46,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2488720.0, ans=0.2 2024-08-14 04:51:48,781 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 04:51:52,003 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 04:51:52,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-14 04:51:59,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2488820.0, ans=0.1 2024-08-14 04:52:06,143 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 04:52:34,825 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 04:52:36,425 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 14 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-14 04:52:42,570 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 04:52:43,864 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 04:52:53,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2550, loss[loss=0.1096, beats_loss=0.01051, ecapa_loss=0.0001273, whisper_loss=0.0978, over 24228.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001557, whisper_loss=0.09085, over 3925322.15 frames. ], batch size: 92, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:52:54,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2489220.0, ans=0.0 2024-08-14 04:53:12,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2489320.0, ans=0.1 2024-08-14 04:53:24,489 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-14 04:53:29,818 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 04:53:38,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2489420.0, ans=0.0 2024-08-14 04:53:43,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.452e+01 2.668e+01 3.104e+01 5.723e+01, threshold=5.337e+01, percent-clipped=1.0 2024-08-14 04:54:11,487 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 04:54:14,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2600, loss[loss=0.1006, beats_loss=0.01091, ecapa_loss=0.000128, whisper_loss=0.08845, over 20316.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001562, whisper_loss=0.09071, over 3897928.85 frames. ], batch size: 77, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:54:37,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2489820.0, ans=0.0 2024-08-14 04:54:37,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2489820.0, ans=0.125 2024-08-14 04:54:43,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-14 04:54:55,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.55 vs. limit=22.5 2024-08-14 04:55:01,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.53 vs. limit=22.5 2024-08-14 04:55:08,109 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 04:55:10,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2490020.0, ans=0.0 2024-08-14 04:55:22,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2490020.0, ans=0.1 2024-08-14 04:55:30,128 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-14 04:55:40,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2490120.0, ans=0.125 2024-08-14 04:55:51,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2650, loss[loss=0.1035, beats_loss=0.008944, ecapa_loss=0.0001409, whisper_loss=0.09317, over 16240.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001567, whisper_loss=0.09059, over 3886867.29 frames. ], batch size: 62, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:55:52,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2490220.0, ans=0.2 2024-08-14 04:56:21,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2490320.0, ans=0.0 2024-08-14 04:56:29,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2490320.0, ans=0.0 2024-08-14 04:56:38,892 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 04:56:45,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.391e+01 2.607e+01 2.986e+01 4.430e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-14 04:56:54,223 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 04:56:57,816 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0513666495680809, model_norm_threshold=52.13920593261719 2024-08-14 04:56:58,002 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.106e+05, grad_sumsq=1.106e+05, orig_rms_sq=1.000e+00 2024-08-14 04:57:13,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2490620.0, ans=0.125 2024-08-14 04:57:15,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2490620.0, ans=0.2 2024-08-14 04:57:26,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2700, loss[loss=0.09906, beats_loss=0.01091, ecapa_loss=0.0001423, whisper_loss=0.08673, over 14908.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001577, whisper_loss=0.09052, over 3868604.85 frames. ], batch size: 58, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:57:30,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2490720.0, ans=0.1 2024-08-14 04:57:37,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2490720.0, ans=0.125 2024-08-14 04:57:54,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2490820.0, ans=0.2 2024-08-14 04:58:12,010 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 04:58:23,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-14 04:58:25,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2490920.0, ans=0.1 2024-08-14 04:58:50,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2491020.0, ans=0.0 2024-08-14 04:58:55,617 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 04:59:23,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2491120.0, ans=0.125 2024-08-14 04:59:26,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2750, loss[loss=0.1063, beats_loss=0.01157, ecapa_loss=0.0001665, whisper_loss=0.09306, over 23517.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001575, whisper_loss=0.09008, over 3885426.04 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 04:59:28,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2491220.0, ans=10.0 2024-08-14 05:00:04,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2491320.0, ans=0.125 2024-08-14 05:00:20,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=12.0 2024-08-14 05:00:25,613 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 05:00:37,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.424e+01 2.607e+01 2.892e+01 1.015e+03, threshold=5.215e+01, percent-clipped=3.0 2024-08-14 05:01:00,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2491520.0, ans=0.0 2024-08-14 05:01:03,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2491620.0, ans=0.125 2024-08-14 05:01:08,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-14 05:01:27,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2800, loss[loss=0.09833, beats_loss=0.01262, ecapa_loss=0.0001293, whisper_loss=0.08442, over 20809.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001567, whisper_loss=0.09029, over 3873494.41 frames. ], batch size: 81, lr: 3.51e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:01:35,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-08-14 05:01:57,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2491820.0, ans=0.1 2024-08-14 05:02:38,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2492020.0, ans=0.125 2024-08-14 05:02:48,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2492020.0, ans=0.125 2024-08-14 05:02:52,611 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 05:02:56,414 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 05:03:02,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2492120.0, ans=0.125 2024-08-14 05:03:10,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2492120.0, ans=0.05 2024-08-14 05:03:20,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2850, loss[loss=0.101, beats_loss=0.0107, ecapa_loss=0.000134, whisper_loss=0.08896, over 18302.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001574, whisper_loss=0.09058, over 3859744.79 frames. ], batch size: 69, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:03:21,253 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 05:03:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2492220.0, ans=0.2 2024-08-14 05:03:39,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-14 05:03:59,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2492420.0, ans=0.1 2024-08-14 05:04:06,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.327e+01 2.505e+01 2.806e+01 7.430e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-14 05:04:19,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2492520.0, ans=0.05 2024-08-14 05:04:22,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2492620.0, ans=0.125 2024-08-14 05:04:23,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2024-08-14 05:04:24,449 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 05:04:24,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-14 05:04:27,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2492620.0, ans=0.125 2024-08-14 05:04:37,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2900, loss[loss=0.08934, beats_loss=0.01124, ecapa_loss=0.0001591, whisper_loss=0.07651, over 19261.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001581, whisper_loss=0.09124, over 3858692.07 frames. ], batch size: 78, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:04:44,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2492720.0, ans=0.1 2024-08-14 05:05:01,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2492820.0, ans=0.125 2024-08-14 05:05:10,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2492920.0, ans=0.125 2024-08-14 05:05:19,042 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:05:44,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2493120.0, ans=0.125 2024-08-14 05:05:44,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2493120.0, ans=0.125 2024-08-14 05:05:50,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2493120.0, ans=0.125 2024-08-14 05:05:52,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 2950, loss[loss=0.1098, beats_loss=0.008216, ecapa_loss=0.0001715, whisper_loss=0.09982, over 21796.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001586, whisper_loss=0.09072, over 3863465.82 frames. ], batch size: 85, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:05:55,933 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 05:06:00,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-14 05:06:05,460 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 05:06:08,481 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.382e-02 2024-08-14 05:06:17,587 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-14 05:06:22,067 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 39 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-14 05:06:34,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.417e+01 2.624e+01 2.963e+01 8.640e+01, threshold=5.248e+01, percent-clipped=1.0 2024-08-14 05:06:35,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2493520.0, ans=0.1 2024-08-14 05:06:38,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2493520.0, ans=0.025 2024-08-14 05:06:46,659 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.278e+01 2024-08-14 05:06:52,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2493620.0, ans=0.125 2024-08-14 05:06:59,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=12.0 2024-08-14 05:06:59,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2493620.0, ans=0.1 2024-08-14 05:07:03,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3000, loss[loss=0.1158, beats_loss=0.007325, ecapa_loss=0.0001832, whisper_loss=0.1067, over 17460.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001589, whisper_loss=0.09085, over 3866375.79 frames. ], batch size: 71, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:07:03,624 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 05:07:44,628 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on ASR_libri: loss=0.2518, beats_loss=0, ecapa_loss=0.0005463, whisper_loss=0.2464, over 922467.00 frames. 2024-08-14 05:08:00,211 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on SV_voxceleb1: loss=0.004304, beats_loss=0, ecapa_loss=0.0004304, whisper_loss=0, over 939242.00 frames. 2024-08-14 05:09:01,804 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6451, 3.1948, 2.5519, 2.2617], device='cuda:3') 2024-08-14 05:10:04,677 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on AT_audioset: loss=0.02354, beats_loss=0.02354, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 05:10:04,680 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 05:10:05,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=15.0 2024-08-14 05:10:06,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2493720.0, ans=0.1 2024-08-14 05:10:18,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2493820.0, ans=0.125 2024-08-14 05:10:37,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2493920.0, ans=0.025 2024-08-14 05:10:40,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2493920.0, ans=0.125 2024-08-14 05:10:53,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2494020.0, ans=0.125 2024-08-14 05:10:59,188 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 05:11:10,387 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 05:11:17,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3050, loss[loss=0.1066, beats_loss=0.01101, ecapa_loss=0.0001507, whisper_loss=0.09407, over 21508.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001582, whisper_loss=0.0912, over 3865843.10 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:11:26,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2494220.0, ans=0.0 2024-08-14 05:11:36,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2494320.0, ans=0.125 2024-08-14 05:11:42,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-14 05:11:59,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.503e+01 2.783e+01 3.185e+01 5.631e+01, threshold=5.566e+01, percent-clipped=1.0 2024-08-14 05:12:10,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2494520.0, ans=0.2 2024-08-14 05:12:13,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2494620.0, ans=0.125 2024-08-14 05:12:17,192 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 05:12:20,119 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 05:12:28,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3100, loss[loss=0.1208, beats_loss=0.009378, ecapa_loss=0.000185, whisper_loss=0.1095, over 22386.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001578, whisper_loss=0.09044, over 3863509.29 frames. ], batch size: 91, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:12:33,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2494720.0, ans=0.125 2024-08-14 05:12:34,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2494720.0, ans=0.0 2024-08-14 05:12:40,351 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:12:46,488 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 05:12:48,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2494820.0, ans=0.0 2024-08-14 05:12:55,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2494820.0, ans=0.1 2024-08-14 05:12:57,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2494920.0, ans=0.09899494936611666 2024-08-14 05:12:58,358 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:13:01,590 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 05:13:03,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2494920.0, ans=0.0 2024-08-14 05:13:07,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2494920.0, ans=0.0 2024-08-14 05:13:23,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2495020.0, ans=0.0 2024-08-14 05:13:42,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3150, loss[loss=0.113, beats_loss=0.0107, ecapa_loss=0.0001876, whisper_loss=0.1004, over 17548.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001582, whisper_loss=0.09077, over 3863907.00 frames. ], batch size: 67, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:13:48,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2495220.0, ans=0.125 2024-08-14 05:13:55,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2495320.0, ans=0.125 2024-08-14 05:14:25,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.581e+01 2.876e+01 7.737e+01, threshold=5.161e+01, percent-clipped=2.0 2024-08-14 05:14:32,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2495520.0, ans=0.0 2024-08-14 05:14:51,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2495620.0, ans=0.05 2024-08-14 05:14:52,815 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 05:14:55,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3200, loss[loss=0.1132, beats_loss=0.01004, ecapa_loss=0.0001713, whisper_loss=0.1014, over 21490.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001586, whisper_loss=0.09144, over 3856325.53 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:15:09,981 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 05:15:12,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2495820.0, ans=22.5 2024-08-14 05:15:16,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2495820.0, ans=0.0 2024-08-14 05:15:22,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-14 05:15:28,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=12.0 2024-08-14 05:15:48,008 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 05:15:51,058 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-14 05:16:08,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3250, loss[loss=0.0871, beats_loss=0.01196, ecapa_loss=0.0001432, whisper_loss=0.0737, over 20610.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0107, ecapa_loss=0.0001589, whisper_loss=0.09208, over 3886086.53 frames. ], batch size: 81, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:16:46,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-14 05:16:51,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.775e+01 3.145e+01 3.018e+02, threshold=5.551e+01, percent-clipped=3.0 2024-08-14 05:17:01,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2024-08-14 05:17:10,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2496620.0, ans=0.0 2024-08-14 05:17:20,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3300, loss[loss=0.1097, beats_loss=0.009844, ecapa_loss=0.0001622, whisper_loss=0.09823, over 19831.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001601, whisper_loss=0.09225, over 3915810.36 frames. ], batch size: 79, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:17:22,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2496720.0, ans=0.125 2024-08-14 05:17:22,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2496720.0, ans=0.125 2024-08-14 05:17:25,310 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 05:17:28,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2496720.0, ans=0.0 2024-08-14 05:17:35,694 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 05:17:45,979 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 05:17:46,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2496820.0, ans=0.125 2024-08-14 05:17:46,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2496820.0, ans=0.0 2024-08-14 05:17:52,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2496920.0, ans=0.07 2024-08-14 05:18:14,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-14 05:18:26,433 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 05:18:28,291 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-14 05:18:33,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3350, loss[loss=0.1341, beats_loss=0.008892, ecapa_loss=0.0002038, whisper_loss=0.1232, over 22460.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01061, ecapa_loss=0.00016, whisper_loss=0.09264, over 3903393.03 frames. ], batch size: 94, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:18:54,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-14 05:18:55,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2497320.0, ans=0.0 2024-08-14 05:19:17,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.312e+01 2.517e+01 2.799e+01 4.556e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-14 05:19:23,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-14 05:19:47,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3400, loss[loss=0.1231, beats_loss=0.009174, ecapa_loss=0.0001091, whisper_loss=0.1128, over 16914.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01063, ecapa_loss=0.0001576, whisper_loss=0.09252, over 3877561.41 frames. ], batch size: 60, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:19:53,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2497720.0, ans=0.0 2024-08-14 05:20:08,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2497820.0, ans=0.125 2024-08-14 05:20:10,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2497820.0, ans=0.1 2024-08-14 05:20:19,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2497920.0, ans=0.125 2024-08-14 05:20:20,737 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-14 05:20:24,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-14 05:20:42,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2498020.0, ans=0.125 2024-08-14 05:20:45,367 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 05:20:58,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2498220.0, ans=0.125 2024-08-14 05:20:59,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3450, loss[loss=0.08487, beats_loss=0.01244, ecapa_loss=0.0001053, whisper_loss=0.07138, over 15690.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01078, ecapa_loss=0.0001584, whisper_loss=0.09086, over 3892100.90 frames. ], batch size: 60, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:21:17,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2498320.0, ans=0.125 2024-08-14 05:21:24,561 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 05:21:28,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-08-14 05:21:29,079 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 40 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 05:21:33,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2498420.0, ans=0.07 2024-08-14 05:21:43,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.288e+01 2.702e+01 3.056e+01 2.683e+02, threshold=5.405e+01, percent-clipped=1.0 2024-08-14 05:21:48,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2498520.0, ans=0.125 2024-08-14 05:21:58,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2498620.0, ans=0.0 2024-08-14 05:22:09,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2024-08-14 05:22:12,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3500, loss[loss=0.112, beats_loss=0.009674, ecapa_loss=0.0001372, whisper_loss=0.1009, over 17499.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001591, whisper_loss=0.09106, over 3859726.10 frames. ], batch size: 66, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:22:15,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2498720.0, ans=0.0 2024-08-14 05:22:20,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2498720.0, ans=0.125 2024-08-14 05:22:46,325 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 05:23:11,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2499120.0, ans=0.125 2024-08-14 05:23:19,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2499120.0, ans=0.1 2024-08-14 05:23:20,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2499120.0, ans=0.1 2024-08-14 05:23:25,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3550, loss[loss=0.103, beats_loss=0.01284, ecapa_loss=0.0001442, whisper_loss=0.0887, over 21111.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01077, ecapa_loss=0.0001583, whisper_loss=0.09074, over 3858082.25 frames. ], batch size: 87, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:23:33,094 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 05:23:39,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2499320.0, ans=0.0 2024-08-14 05:23:42,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2499320.0, ans=0.2 2024-08-14 05:23:42,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2499320.0, ans=0.125 2024-08-14 05:24:04,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2499420.0, ans=0.125 2024-08-14 05:24:10,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.402e+01 2.607e+01 2.928e+01 5.339e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 05:24:13,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2499520.0, ans=0.125 2024-08-14 05:24:14,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.84 vs. limit=22.5 2024-08-14 05:24:22,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2499520.0, ans=0.125 2024-08-14 05:24:23,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2499620.0, ans=0.0 2024-08-14 05:24:26,723 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 05:24:28,062 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 05:24:39,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3600, loss[loss=0.1089, beats_loss=0.0113, ecapa_loss=0.0001466, whisper_loss=0.09609, over 19791.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001585, whisper_loss=0.09053, over 3825637.98 frames. ], batch size: 78, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:25:21,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2499920.0, ans=0.2 2024-08-14 05:25:22,567 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 05:25:28,302 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 05:25:40,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2500120.0, ans=0.0 2024-08-14 05:25:53,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3650, loss[loss=0.09311, beats_loss=0.01062, ecapa_loss=0.000188, whisper_loss=0.08061, over 22235.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001592, whisper_loss=0.09075, over 3832931.07 frames. ], batch size: 95, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:26:08,359 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 05:26:19,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2500320.0, ans=0.0 2024-08-14 05:26:38,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.437e+01 2.673e+01 3.010e+01 1.345e+02, threshold=5.347e+01, percent-clipped=1.0 2024-08-14 05:27:07,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3700, loss[loss=0.08329, beats_loss=0.01256, ecapa_loss=0.0001638, whisper_loss=0.06909, over 15467.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001589, whisper_loss=0.09115, over 3815773.70 frames. ], batch size: 59, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:27:14,882 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 05:27:16,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2500720.0, ans=0.125 2024-08-14 05:27:39,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2500920.0, ans=0.09899494936611666 2024-08-14 05:27:39,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2500920.0, ans=0.1 2024-08-14 05:27:45,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2500920.0, ans=0.0 2024-08-14 05:27:52,229 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 05:28:03,908 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 05:28:19,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3750, loss[loss=0.08307, beats_loss=0.01385, ecapa_loss=0.000139, whisper_loss=0.06784, over 20062.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001589, whisper_loss=0.09145, over 3839873.20 frames. ], batch size: 83, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:28:29,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.19 vs. limit=10.0 2024-08-14 05:28:33,357 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 05:28:38,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2024-08-14 05:28:41,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2501320.0, ans=0.125 2024-08-14 05:28:41,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-14 05:28:42,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2501320.0, ans=0.125 2024-08-14 05:29:03,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.382e+01 2.609e+01 2.989e+01 8.009e+01, threshold=5.218e+01, percent-clipped=2.0 2024-08-14 05:29:22,076 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.271e-02 2024-08-14 05:29:23,025 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 05:29:28,484 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 05:29:32,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3800, loss[loss=0.09492, beats_loss=0.01021, ecapa_loss=0.0001577, whisper_loss=0.08313, over 22829.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.0001591, whisper_loss=0.09113, over 3824649.29 frames. ], batch size: 93, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:29:49,346 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 05:29:54,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2501820.0, ans=0.125 2024-08-14 05:30:16,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2502020.0, ans=0.025 2024-08-14 05:30:18,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2502020.0, ans=0.1 2024-08-14 05:30:19,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=12.0 2024-08-14 05:30:27,503 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 05:30:29,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2502020.0, ans=0.2 2024-08-14 05:30:36,613 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 05:30:46,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3850, loss[loss=0.1065, beats_loss=0.00936, ecapa_loss=0.0001467, whisper_loss=0.09569, over 15349.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001581, whisper_loss=0.09101, over 3851644.85 frames. ], batch size: 59, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:31:02,994 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 05:31:06,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2502320.0, ans=0.1 2024-08-14 05:31:15,477 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-14 05:31:27,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2502420.0, ans=0.0 2024-08-14 05:31:29,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2502520.0, ans=0.125 2024-08-14 05:31:29,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.353e+01 2.523e+01 2.870e+01 4.680e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 05:31:49,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2502620.0, ans=0.125 2024-08-14 05:31:49,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2502620.0, ans=0.1 2024-08-14 05:31:51,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-14 05:31:59,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3900, loss[loss=0.1086, beats_loss=0.01045, ecapa_loss=0.0001375, whisper_loss=0.09678, over 19167.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01079, ecapa_loss=0.0001579, whisper_loss=0.09133, over 3862980.93 frames. ], batch size: 75, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:32:11,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2502720.0, ans=0.1 2024-08-14 05:32:23,861 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 05:32:24,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-08-14 05:32:30,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2502920.0, ans=0.125 2024-08-14 05:32:31,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2502920.0, ans=0.2 2024-08-14 05:32:55,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-08-14 05:33:09,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2503120.0, ans=0.125 2024-08-14 05:33:11,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2503220.0, ans=0.2 2024-08-14 05:33:12,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 3950, loss[loss=0.1085, beats_loss=0.01107, ecapa_loss=0.000138, whisper_loss=0.096, over 22164.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001586, whisper_loss=0.09248, over 3913138.49 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:33:27,550 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 05:33:29,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2503320.0, ans=0.0 2024-08-14 05:33:55,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.430e+01 2.817e+01 3.192e+01 2.202e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-14 05:34:11,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2503620.0, ans=0.0 2024-08-14 05:34:14,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2024-08-14 05:34:18,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2503620.0, ans=0.0 2024-08-14 05:34:25,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4000, loss[loss=0.1085, beats_loss=0.01206, ecapa_loss=0.0001199, whisper_loss=0.09525, over 19119.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001587, whisper_loss=0.09125, over 3900729.58 frames. ], batch size: 73, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:34:32,739 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 05:34:34,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2503720.0, ans=0.0 2024-08-14 05:34:35,654 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 12 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 05:34:47,145 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-14 05:35:37,397 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 05:35:38,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4050, loss[loss=0.1026, beats_loss=0.01157, ecapa_loss=0.000141, whisper_loss=0.08958, over 21526.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001603, whisper_loss=0.09119, over 3889733.36 frames. ], batch size: 84, lr: 3.50e-03, grad_scale: 1.152921504606847e+18 2024-08-14 05:35:50,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.00 vs. limit=10.0 2024-08-14 05:36:22,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.679e+01 2.286e+01 2.527e+01 2.897e+01 4.039e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-14 05:36:23,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2504520.0, ans=0.125 2024-08-14 05:36:49,473 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 05:36:51,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4100, loss[loss=0.09913, beats_loss=0.01173, ecapa_loss=0.0001461, whisper_loss=0.08594, over 21371.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01065, ecapa_loss=0.0001616, whisper_loss=0.09211, over 3896815.39 frames. ], batch size: 86, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:37:02,429 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 05:37:09,566 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-14 05:37:29,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2504920.0, ans=0.1 2024-08-14 05:37:39,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2505020.0, ans=0.0 2024-08-14 05:37:43,156 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 05:37:59,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2505120.0, ans=0.0 2024-08-14 05:38:01,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-14 05:38:04,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4150, loss[loss=0.1216, beats_loss=0.009379, ecapa_loss=0.0001769, whisper_loss=0.1105, over 20181.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0107, ecapa_loss=0.0001605, whisper_loss=0.09213, over 3898083.59 frames. ], batch size: 80, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:38:07,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-14 05:38:11,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2505220.0, ans=0.1 2024-08-14 05:38:15,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2505220.0, ans=0.125 2024-08-14 05:38:17,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2505320.0, ans=0.015 2024-08-14 05:38:42,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-14 05:38:49,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.395e+01 2.659e+01 2.961e+01 5.291e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 05:38:58,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-14 05:38:59,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2505520.0, ans=0.125 2024-08-14 05:39:04,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-14 05:39:17,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4200, loss[loss=0.1091, beats_loss=0.01012, ecapa_loss=0.0001475, whisper_loss=0.09748, over 22187.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.000159, whisper_loss=0.09188, over 3894063.46 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:39:21,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2505720.0, ans=0.1 2024-08-14 05:39:25,531 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 05:39:32,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.71 vs. limit=22.5 2024-08-14 05:39:33,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-14 05:39:54,581 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 05:39:59,448 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 05:40:02,333 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 05:40:02,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2506020.0, ans=0.0 2024-08-14 05:40:05,056 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 05:40:14,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2506020.0, ans=0.1 2024-08-14 05:40:24,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2506120.0, ans=0.125 2024-08-14 05:40:31,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4250, loss[loss=0.1074, beats_loss=0.01123, ecapa_loss=0.0001521, whisper_loss=0.09462, over 22214.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001594, whisper_loss=0.09155, over 3904970.60 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:40:35,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-14 05:40:46,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2506320.0, ans=0.1 2024-08-14 05:40:57,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2506320.0, ans=10.0 2024-08-14 05:40:57,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2506320.0, ans=0.0 2024-08-14 05:41:05,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2506420.0, ans=0.1 2024-08-14 05:41:05,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2506420.0, ans=0.125 2024-08-14 05:41:16,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.332e+01 2.542e+01 2.821e+01 5.499e+01, threshold=5.083e+01, percent-clipped=1.0 2024-08-14 05:41:21,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-14 05:41:28,130 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 05:41:36,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-14 05:41:37,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2506620.0, ans=0.125 2024-08-14 05:41:41,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-14 05:41:44,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4300, loss[loss=0.08055, beats_loss=0.01227, ecapa_loss=0.0001834, whisper_loss=0.06644, over 19699.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001596, whisper_loss=0.0915, over 3898637.41 frames. ], batch size: 84, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:41:45,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2506720.0, ans=0.125 2024-08-14 05:41:56,907 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 9 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 05:41:58,446 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 05:42:02,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2506820.0, ans=0.0 2024-08-14 05:42:04,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2506820.0, ans=0.125 2024-08-14 05:42:21,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2024-08-14 05:42:25,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2506920.0, ans=0.125 2024-08-14 05:42:43,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2507120.0, ans=0.125 2024-08-14 05:42:43,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-14 05:42:49,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2507120.0, ans=0.2 2024-08-14 05:42:58,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2507220.0, ans=0.015 2024-08-14 05:42:59,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4350, loss[loss=0.104, beats_loss=0.01054, ecapa_loss=0.0001423, whisper_loss=0.09204, over 21689.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001605, whisper_loss=0.09136, over 3876370.36 frames. ], batch size: 86, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:43:02,832 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:43:06,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2507220.0, ans=0.125 2024-08-14 05:43:23,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2507320.0, ans=0.0 2024-08-14 05:43:28,461 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 05:43:30,433 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:43:43,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.369e+01 2.648e+01 3.108e+01 4.930e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-14 05:43:51,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2507520.0, ans=0.125 2024-08-14 05:44:12,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4400, loss[loss=0.09211, beats_loss=0.01228, ecapa_loss=0.0001772, whisper_loss=0.07806, over 19908.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001598, whisper_loss=0.09058, over 3883682.76 frames. ], batch size: 80, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:44:14,211 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 05:44:17,623 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 05:44:49,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-08-14 05:45:11,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2508020.0, ans=0.0 2024-08-14 05:45:12,127 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 05:45:27,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4450, loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0001485, whisper_loss=0.09391, over 20617.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0107, ecapa_loss=0.0001588, whisper_loss=0.091, over 3891650.88 frames. ], batch size: 82, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:45:33,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-14 05:45:38,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2508220.0, ans=0.125 2024-08-14 05:45:45,881 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 05:45:46,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2508320.0, ans=0.125 2024-08-14 05:45:47,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-08-14 05:46:13,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.408e+01 2.704e+01 3.117e+01 4.091e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-14 05:46:24,357 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 05:46:35,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2508620.0, ans=0.125 2024-08-14 05:46:38,272 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 05:46:41,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2508720.0, ans=0.125 2024-08-14 05:46:42,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4500, loss[loss=0.1011, beats_loss=0.01013, ecapa_loss=0.0001706, whisper_loss=0.08924, over 20173.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01074, ecapa_loss=0.0001577, whisper_loss=0.09112, over 3904167.20 frames. ], batch size: 83, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:47:08,881 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 05:47:18,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2508920.0, ans=0.1 2024-08-14 05:47:23,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2508920.0, ans=0.125 2024-08-14 05:47:26,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2508920.0, ans=0.125 2024-08-14 05:47:31,503 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-14 05:47:48,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2509120.0, ans=0.0 2024-08-14 05:47:52,374 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 05:47:53,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2509120.0, ans=0.125 2024-08-14 05:47:55,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2509120.0, ans=0.0 2024-08-14 05:47:57,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-14 05:48:01,705 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4550, loss[loss=0.1115, beats_loss=0.01059, ecapa_loss=0.0001365, whisper_loss=0.09953, over 22813.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001579, whisper_loss=0.09117, over 3893285.73 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:48:02,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2509220.0, ans=0.2 2024-08-14 05:48:20,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2509320.0, ans=0.125 2024-08-14 05:48:21,275 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-14 05:48:33,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2509420.0, ans=0.125 2024-08-14 05:48:33,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-14 05:48:39,386 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 05:48:47,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-08-14 05:48:51,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.350e+01 2.581e+01 3.010e+01 9.450e+01, threshold=5.163e+01, percent-clipped=2.0 2024-08-14 05:49:14,267 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.491e+01 2024-08-14 05:49:15,321 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 05:49:20,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4600, loss[loss=0.07427, beats_loss=0.01231, ecapa_loss=0.0001341, whisper_loss=0.06063, over 13872.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001588, whisper_loss=0.09068, over 3876756.36 frames. ], batch size: 54, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:49:24,609 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:49:37,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2509820.0, ans=0.125 2024-08-14 05:49:55,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-08-14 05:50:02,645 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 05:50:02,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2509920.0, ans=0.125 2024-08-14 05:50:07,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2510020.0, ans=0.1 2024-08-14 05:50:17,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-14 05:50:37,832 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-14 05:50:40,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2510220.0, ans=0.0 2024-08-14 05:50:41,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4650, loss[loss=0.09654, beats_loss=0.012, ecapa_loss=0.0001265, whisper_loss=0.08327, over 19616.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001578, whisper_loss=0.09082, over 3904923.93 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:51:06,059 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 05:51:06,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2510320.0, ans=0.0 2024-08-14 05:51:10,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2510320.0, ans=0.2 2024-08-14 05:51:12,293 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 32 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 05:51:28,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2510520.0, ans=0.0 2024-08-14 05:51:30,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.365e+01 2.623e+01 2.877e+01 4.425e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 05:51:34,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2510520.0, ans=0.125 2024-08-14 05:52:00,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4700, loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.0001751, whisper_loss=0.0918, over 18818.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001581, whisper_loss=0.09072, over 3906915.74 frames. ], batch size: 75, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:52:01,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2510720.0, ans=0.0 2024-08-14 05:52:11,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2510720.0, ans=0.1 2024-08-14 05:52:12,360 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 05:52:29,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2024-08-14 05:52:40,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2510920.0, ans=0.0 2024-08-14 05:53:00,887 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 05:53:02,855 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 05:53:03,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2511120.0, ans=0.1 2024-08-14 05:53:17,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2511120.0, ans=10.0 2024-08-14 05:53:19,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4750, loss[loss=0.114, beats_loss=0.00802, ecapa_loss=0.0001384, whisper_loss=0.1046, over 16232.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001581, whisper_loss=0.09096, over 3903849.40 frames. ], batch size: 58, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:53:29,817 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-14 05:53:36,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2511320.0, ans=0.0 2024-08-14 05:53:40,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2511320.0, ans=0.2 2024-08-14 05:53:56,565 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 05:54:08,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.355e+01 2.556e+01 2.982e+01 9.125e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-14 05:54:11,693 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.226e-01 2024-08-14 05:54:29,263 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:54:31,847 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 34 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 05:54:34,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2511620.0, ans=0.125 2024-08-14 05:54:39,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4800, loss[loss=0.09296, beats_loss=0.009166, ecapa_loss=0.0001899, whisper_loss=0.0819, over 22060.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001599, whisper_loss=0.09087, over 3907819.21 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:54:56,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-08-14 05:55:04,514 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 05:55:11,062 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 05:55:19,345 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 05:55:19,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2511920.0, ans=0.125 2024-08-14 05:55:22,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2511920.0, ans=0.125 2024-08-14 05:55:38,484 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 32 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-14 05:55:43,624 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 05:55:52,339 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.354e-02 2024-08-14 05:56:01,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4850, loss[loss=0.09729, beats_loss=0.007465, ecapa_loss=0.0001974, whisper_loss=0.08785, over 13921.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001598, whisper_loss=0.09102, over 3912992.34 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:56:02,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2512220.0, ans=0.125 2024-08-14 05:56:03,421 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 05:56:09,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2512220.0, ans=0.125 2024-08-14 05:56:37,507 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 05:56:41,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2512420.0, ans=0.09899494936611666 2024-08-14 05:56:51,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.430e+01 2.607e+01 2.996e+01 1.441e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-14 05:56:54,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2512520.0, ans=0.0 2024-08-14 05:57:02,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2024-08-14 05:57:05,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2512620.0, ans=0.125 2024-08-14 05:57:10,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2512620.0, ans=0.125 2024-08-14 05:57:17,269 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 05:57:22,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4900, loss[loss=0.108, beats_loss=0.008912, ecapa_loss=0.0002004, whisper_loss=0.09712, over 13632.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001603, whisper_loss=0.09142, over 3892988.39 frames. ], batch size: 56, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:57:29,979 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 05:57:35,040 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 05:57:41,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2512820.0, ans=0.125 2024-08-14 05:57:44,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2512820.0, ans=0.2 2024-08-14 05:57:47,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2512820.0, ans=0.1 2024-08-14 05:57:57,182 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 35 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 05:58:08,719 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 05:58:25,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-14 05:58:36,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2513120.0, ans=0.125 2024-08-14 05:58:36,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2513120.0, ans=0.0 2024-08-14 05:58:39,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-14 05:58:40,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 4950, loss[loss=0.1076, beats_loss=0.01186, ecapa_loss=0.0001593, whisper_loss=0.09419, over 20792.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001597, whisper_loss=0.09108, over 3896770.74 frames. ], batch size: 82, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 05:58:54,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2513320.0, ans=0.2 2024-08-14 05:58:59,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2513320.0, ans=0.1 2024-08-14 05:59:06,367 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 05:59:09,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=12.0 2024-08-14 05:59:21,974 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 05:59:26,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2513520.0, ans=0.2 2024-08-14 05:59:29,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.393e+01 2.657e+01 2.925e+01 4.625e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-14 05:59:29,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2513520.0, ans=0.2 2024-08-14 05:59:32,115 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 05:59:32,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2513520.0, ans=0.125 2024-08-14 05:59:34,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-14 05:59:37,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2513520.0, ans=0.2 2024-08-14 05:59:58,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5000, loss[loss=0.08473, beats_loss=0.01259, ecapa_loss=0.0001394, whisper_loss=0.07075, over 18549.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001599, whisper_loss=0.09099, over 3878039.76 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:00:03,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-14 06:00:09,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2513720.0, ans=0.035 2024-08-14 06:00:33,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2513920.0, ans=0.0 2024-08-14 06:00:39,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2513920.0, ans=0.125 2024-08-14 06:00:56,138 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 06:01:04,904 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-08-14 06:01:16,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5050, loss[loss=0.08208, beats_loss=0.01243, ecapa_loss=0.0001976, whisper_loss=0.06767, over 13126.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001587, whisper_loss=0.0907, over 3879395.75 frames. ], batch size: 57, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:01:22,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2514220.0, ans=0.0 2024-08-14 06:01:37,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2514320.0, ans=0.125 2024-08-14 06:01:56,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2514420.0, ans=0.2 2024-08-14 06:01:56,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2514420.0, ans=0.0 2024-08-14 06:02:05,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.401e+01 2.602e+01 2.912e+01 4.134e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-14 06:02:06,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2514520.0, ans=0.1 2024-08-14 06:02:07,389 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:02:07,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-08-14 06:02:10,626 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:02:15,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2514520.0, ans=0.125 2024-08-14 06:02:16,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2514520.0, ans=0.125 2024-08-14 06:02:34,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5100, loss[loss=0.1015, beats_loss=0.009739, ecapa_loss=0.0001412, whisper_loss=0.0903, over 17440.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.09184, over 3897154.09 frames. ], batch size: 68, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:02:44,732 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 06:02:55,343 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 06:02:57,418 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-14 06:03:01,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2514820.0, ans=0.125 2024-08-14 06:03:29,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2515020.0, ans=0.1 2024-08-14 06:03:46,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2515120.0, ans=0.125 2024-08-14 06:03:52,640 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 06:03:53,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5150, loss[loss=0.1117, beats_loss=0.01097, ecapa_loss=0.0001576, whisper_loss=0.09918, over 23100.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001563, whisper_loss=0.09151, over 3931272.92 frames. ], batch size: 92, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:03:55,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2024-08-14 06:04:09,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2515320.0, ans=0.125 2024-08-14 06:04:11,479 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 06:04:11,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2515320.0, ans=0.125 2024-08-14 06:04:14,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2515320.0, ans=0.125 2024-08-14 06:04:16,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2515320.0, ans=0.125 2024-08-14 06:04:20,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2515320.0, ans=0.1 2024-08-14 06:04:24,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2024-08-14 06:04:42,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.426e+01 2.661e+01 3.204e+01 7.186e+01, threshold=5.323e+01, percent-clipped=2.0 2024-08-14 06:05:11,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2515720.0, ans=0.125 2024-08-14 06:05:12,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5200, loss[loss=0.1073, beats_loss=0.01136, ecapa_loss=0.0001368, whisper_loss=0.09453, over 17833.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001573, whisper_loss=0.0916, over 3896210.55 frames. ], batch size: 67, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:05:14,773 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-14 06:05:17,976 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 06:05:18,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2515720.0, ans=0.125 2024-08-14 06:05:34,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2024-08-14 06:06:02,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-08-14 06:06:08,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2024-08-14 06:06:17,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-08-14 06:06:32,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5250, loss[loss=0.1208, beats_loss=0.01082, ecapa_loss=0.0001352, whisper_loss=0.1086, over 23214.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001576, whisper_loss=0.09147, over 3895665.07 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:06:36,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2516220.0, ans=0.0 2024-08-14 06:06:48,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2516320.0, ans=0.0 2024-08-14 06:07:21,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.375e+01 2.671e+01 2.925e+01 9.126e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-14 06:07:22,960 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 06:07:38,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2516620.0, ans=0.125 2024-08-14 06:07:42,302 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:07:42,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2516620.0, ans=0.125 2024-08-14 06:07:50,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2516620.0, ans=0.09899494936611666 2024-08-14 06:07:51,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-14 06:07:52,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5300, loss[loss=0.1116, beats_loss=0.008122, ecapa_loss=0.0002242, whisper_loss=0.1012, over 15788.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01056, ecapa_loss=0.0001578, whisper_loss=0.09273, over 3865319.46 frames. ], batch size: 67, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:07:57,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2516720.0, ans=0.125 2024-08-14 06:08:15,345 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 06:08:15,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2516820.0, ans=0.125 2024-08-14 06:08:17,411 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 06:08:19,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2024-08-14 06:08:26,967 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 06:08:47,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2517020.0, ans=0.125 2024-08-14 06:09:12,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5350, loss[loss=0.1211, beats_loss=0.01143, ecapa_loss=0.0001613, whisper_loss=0.1081, over 21835.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0106, ecapa_loss=0.0001571, whisper_loss=0.09214, over 3858013.43 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:09:15,553 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 06:09:20,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-08-14 06:09:25,116 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:09:29,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2517320.0, ans=0.2 2024-08-14 06:10:01,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.326e+01 2.604e+01 3.065e+01 1.793e+02, threshold=5.208e+01, percent-clipped=2.0 2024-08-14 06:10:02,787 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.645e+01 2024-08-14 06:10:20,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=12.0 2024-08-14 06:10:31,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2517720.0, ans=0.05 2024-08-14 06:10:32,427 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5400, loss[loss=0.09609, beats_loss=0.01143, ecapa_loss=0.0001492, whisper_loss=0.08316, over 15343.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01053, ecapa_loss=0.0001574, whisper_loss=0.09255, over 3839205.71 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:10:54,768 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 06:10:56,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2517820.0, ans=0.125 2024-08-14 06:11:08,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=12.0 2024-08-14 06:11:12,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2517920.0, ans=0.0 2024-08-14 06:11:21,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2518020.0, ans=0.125 2024-08-14 06:11:33,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2518120.0, ans=0.1 2024-08-14 06:11:35,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2518120.0, ans=0.125 2024-08-14 06:11:45,410 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 06:11:51,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5450, loss[loss=0.1142, beats_loss=0.009063, ecapa_loss=0.0002008, whisper_loss=0.1032, over 19138.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01059, ecapa_loss=0.0001581, whisper_loss=0.09191, over 3842428.64 frames. ], batch size: 76, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:11:56,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-08-14 06:11:57,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2518220.0, ans=0.2 2024-08-14 06:12:15,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2518320.0, ans=0.125 2024-08-14 06:12:39,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.87 vs. limit=10.0 2024-08-14 06:12:41,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.369e+01 2.569e+01 2.930e+01 1.155e+02, threshold=5.138e+01, percent-clipped=3.0 2024-08-14 06:13:10,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5500, loss[loss=0.0946, beats_loss=0.01283, ecapa_loss=0.0001758, whisper_loss=0.08, over 19166.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001564, whisper_loss=0.09168, over 3856301.27 frames. ], batch size: 78, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:13:48,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2518920.0, ans=0.0 2024-08-14 06:14:08,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2519020.0, ans=0.125 2024-08-14 06:14:21,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2519120.0, ans=0.125 2024-08-14 06:14:26,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.94 vs. limit=22.5 2024-08-14 06:14:30,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5550, loss[loss=0.1028, beats_loss=0.01188, ecapa_loss=0.0001742, whisper_loss=0.08917, over 21503.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01066, ecapa_loss=0.0001573, whisper_loss=0.09205, over 3894214.16 frames. ], batch size: 87, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:14:48,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2519320.0, ans=0.0 2024-08-14 06:14:51,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2519320.0, ans=0.125 2024-08-14 06:15:21,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2519520.0, ans=0.125 2024-08-14 06:15:21,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.517e+01 2.810e+01 6.286e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-14 06:15:22,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2519520.0, ans=0.2 2024-08-14 06:15:34,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2519620.0, ans=0.0 2024-08-14 06:15:37,539 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 06:15:49,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2519720.0, ans=0.125 2024-08-14 06:15:50,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5600, loss[loss=0.1322, beats_loss=0.008134, ecapa_loss=0.0001841, whisper_loss=0.1223, over 22397.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001584, whisper_loss=0.09153, over 3867436.00 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:16:12,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2519820.0, ans=0.2 2024-08-14 06:16:29,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2519920.0, ans=0.125 2024-08-14 06:16:31,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-08-14 06:16:41,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-08-14 06:17:06,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2520120.0, ans=0.125 2024-08-14 06:17:10,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5650, loss[loss=0.1145, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.1027, over 22772.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001578, whisper_loss=0.09139, over 3873689.55 frames. ], batch size: 88, lr: 3.49e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:17:12,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2520220.0, ans=0.125 2024-08-14 06:17:13,967 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 06:17:15,354 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 06:17:30,293 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 06:17:40,909 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 06:18:00,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.353e+01 2.635e+01 2.874e+01 6.701e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-14 06:18:18,465 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 06:18:32,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5700, loss[loss=0.1169, beats_loss=0.01089, ecapa_loss=0.0001132, whisper_loss=0.1049, over 24873.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01076, ecapa_loss=0.0001579, whisper_loss=0.09131, over 3897158.96 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:18:34,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2520720.0, ans=0.0 2024-08-14 06:18:42,292 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:18:54,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2520820.0, ans=0.1 2024-08-14 06:19:28,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2521020.0, ans=0.0 2024-08-14 06:19:28,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.57 vs. limit=6.0 2024-08-14 06:19:32,441 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 06:19:41,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2521120.0, ans=0.0 2024-08-14 06:19:51,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-14 06:19:51,833 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 06:19:52,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5750, loss[loss=0.09244, beats_loss=0.009286, ecapa_loss=0.0001728, whisper_loss=0.08143, over 15998.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001577, whisper_loss=0.09097, over 3871031.11 frames. ], batch size: 61, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:19:55,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2521220.0, ans=0.2 2024-08-14 06:19:56,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2521220.0, ans=0.0 2024-08-14 06:20:00,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2521220.0, ans=0.125 2024-08-14 06:20:10,747 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-14 06:20:15,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2521320.0, ans=0.125 2024-08-14 06:20:34,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2521420.0, ans=0.2 2024-08-14 06:20:36,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-14 06:20:41,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.372e+01 2.640e+01 2.859e+01 6.893e+01, threshold=5.281e+01, percent-clipped=1.0 2024-08-14 06:20:42,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2521520.0, ans=0.125 2024-08-14 06:21:12,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5800, loss[loss=0.1054, beats_loss=0.009739, ecapa_loss=0.0001569, whisper_loss=0.09408, over 16684.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001586, whisper_loss=0.09064, over 3873624.74 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:21:14,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2521720.0, ans=0.125 2024-08-14 06:21:14,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2521720.0, ans=0.0 2024-08-14 06:21:19,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2521720.0, ans=0.125 2024-08-14 06:21:27,468 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 06:21:47,293 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 06:21:49,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2521920.0, ans=0.125 2024-08-14 06:22:11,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2522120.0, ans=0.035 2024-08-14 06:22:26,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5850, loss[loss=0.112, beats_loss=0.01063, ecapa_loss=0.0001356, whisper_loss=0.1, over 20180.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001589, whisper_loss=0.09065, over 3876559.71 frames. ], batch size: 79, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:22:38,313 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 06:22:42,615 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.130e-02 2024-08-14 06:22:43,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2522320.0, ans=0.125 2024-08-14 06:22:45,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2522320.0, ans=0.125 2024-08-14 06:22:48,053 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 06:22:48,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2522320.0, ans=0.125 2024-08-14 06:22:49,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2522320.0, ans=0.04949747468305833 2024-08-14 06:23:01,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2024-08-14 06:23:04,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.14 vs. limit=22.5 2024-08-14 06:23:10,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.428e+01 2.673e+01 2.941e+01 3.816e+01, threshold=5.346e+01, percent-clipped=0.0 2024-08-14 06:23:23,731 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 06:23:31,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2522620.0, ans=0.125 2024-08-14 06:23:38,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5900, loss[loss=0.1055, beats_loss=0.01085, ecapa_loss=0.0001473, whisper_loss=0.09318, over 24050.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001586, whisper_loss=0.09075, over 3913040.94 frames. ], batch size: 94, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:23:39,884 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 06:23:40,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2522720.0, ans=0.125 2024-08-14 06:23:44,184 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 06:23:55,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2522820.0, ans=0.125 2024-08-14 06:24:17,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-14 06:24:20,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 06:24:27,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-14 06:24:35,447 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 06:24:37,870 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 06:24:45,110 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 06:24:47,702 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 5950, loss[loss=0.09727, beats_loss=0.009735, ecapa_loss=0.0001536, whisper_loss=0.086, over 14649.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.0001587, whisper_loss=0.09075, over 3895765.09 frames. ], batch size: 55, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:24:55,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2523220.0, ans=0.1 2024-08-14 06:24:56,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2523220.0, ans=0.95 2024-08-14 06:25:15,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2523420.0, ans=0.1 2024-08-14 06:25:15,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.82 vs. limit=22.5 2024-08-14 06:25:27,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-14 06:25:28,067 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 06:25:30,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.432e+01 2.806e+01 3.149e+01 6.455e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-14 06:25:33,331 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 06:25:39,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2523520.0, ans=0.2 2024-08-14 06:25:42,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2523620.0, ans=0.125 2024-08-14 06:25:44,164 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 06:25:56,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6000, loss[loss=0.08297, beats_loss=0.01189, ecapa_loss=0.0001361, whisper_loss=0.06972, over 19907.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001579, whisper_loss=0.09098, over 3902970.10 frames. ], batch size: 79, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:25:56,746 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 06:26:36,889 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on ASR_libri: loss=0.2513, beats_loss=0, ecapa_loss=0.0005424, whisper_loss=0.2459, over 922467.00 frames. 2024-08-14 06:26:55,922 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on SV_voxceleb1: loss=0.004393, beats_loss=0, ecapa_loss=0.0004393, whisper_loss=0, over 939242.00 frames. 2024-08-14 06:28:56,695 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on AT_audioset: loss=0.02347, beats_loss=0.02347, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 06:28:56,699 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 06:29:05,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2523720.0, ans=0.0 2024-08-14 06:29:31,269 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 06:29:44,580 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 06:29:47,556 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 06:29:47,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2524020.0, ans=0.0 2024-08-14 06:30:01,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.15 vs. limit=5.0 2024-08-14 06:30:05,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6050, loss[loss=0.1167, beats_loss=0.009505, ecapa_loss=0.0001612, whisper_loss=0.1056, over 23461.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001576, whisper_loss=0.09172, over 3926673.59 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:30:22,655 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 06:30:22,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2524320.0, ans=0.035 2024-08-14 06:30:22,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2524320.0, ans=0.125 2024-08-14 06:30:27,855 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 18 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 06:30:32,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2524420.0, ans=0.1 2024-08-14 06:30:33,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-14 06:30:34,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2524420.0, ans=0.125 2024-08-14 06:30:38,116 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 06:30:40,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=12.0 2024-08-14 06:30:44,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2524420.0, ans=10.0 2024-08-14 06:30:48,159 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 06:30:49,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.348e+01 2.542e+01 2.875e+01 5.513e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 06:30:51,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2524520.0, ans=0.125 2024-08-14 06:31:14,047 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 06:31:15,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6100, loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001183, whisper_loss=0.09214, over 19525.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001576, whisper_loss=0.09173, over 3924068.92 frames. ], batch size: 74, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:31:19,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2524720.0, ans=10.0 2024-08-14 06:31:25,175 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 06:31:50,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2524920.0, ans=0.0 2024-08-14 06:32:13,668 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-14 06:32:25,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6150, loss[loss=0.1105, beats_loss=0.008786, ecapa_loss=0.0001985, whisper_loss=0.09977, over 20596.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001574, whisper_loss=0.09167, over 3909969.74 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:32:39,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2525320.0, ans=0.0 2024-08-14 06:33:03,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2525420.0, ans=0.125 2024-08-14 06:33:10,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.296e+01 2.588e+01 2.950e+01 9.161e+01, threshold=5.175e+01, percent-clipped=1.0 2024-08-14 06:33:28,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2525620.0, ans=0.0 2024-08-14 06:33:37,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6200, loss[loss=0.0944, beats_loss=0.01155, ecapa_loss=0.0001153, whisper_loss=0.0817, over 18507.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001571, whisper_loss=0.09208, over 3939008.84 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 1.152921504606847e+18 2024-08-14 06:33:56,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2525820.0, ans=0.2 2024-08-14 06:33:56,302 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.348e+00 2024-08-14 06:33:58,927 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 06:34:01,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-14 06:34:07,047 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 06:34:10,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2525920.0, ans=0.125 2024-08-14 06:34:11,583 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:34:13,191 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 06:34:17,314 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 06:34:17,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2525920.0, ans=0.0 2024-08-14 06:34:22,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2526020.0, ans=0.1 2024-08-14 06:34:22,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2526020.0, ans=0.125 2024-08-14 06:34:28,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2526020.0, ans=0.2 2024-08-14 06:34:29,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2526020.0, ans=0.125 2024-08-14 06:34:46,536 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 06:34:54,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6250, loss[loss=0.1179, beats_loss=0.008225, ecapa_loss=0.0001824, whisper_loss=0.1079, over 18639.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01074, ecapa_loss=0.0001579, whisper_loss=0.09165, over 3960930.79 frames. ], batch size: 74, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:34:57,331 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 06:35:02,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2526220.0, ans=0.125 2024-08-14 06:35:16,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2024-08-14 06:35:39,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2526420.0, ans=0.125 2024-08-14 06:35:41,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2526520.0, ans=0.125 2024-08-14 06:35:44,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.485e+01 2.719e+01 3.146e+01 4.092e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-14 06:35:45,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2526520.0, ans=0.0 2024-08-14 06:35:48,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2526520.0, ans=0.1 2024-08-14 06:35:49,304 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 06:35:57,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2526620.0, ans=0.025 2024-08-14 06:35:58,873 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 06:36:11,027 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 06:36:12,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6300, loss[loss=0.09186, beats_loss=0.01259, ecapa_loss=0.0001818, whisper_loss=0.07744, over 21862.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001566, whisper_loss=0.09175, over 3931188.92 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:36:20,088 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-14 06:37:03,550 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 06:37:04,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2527020.0, ans=0.0 2024-08-14 06:37:17,496 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 06:37:30,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6350, loss[loss=0.1078, beats_loss=0.01068, ecapa_loss=0.0002133, whisper_loss=0.09495, over 20898.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01072, ecapa_loss=0.0001583, whisper_loss=0.09168, over 3908081.43 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:37:44,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-14 06:37:45,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2527320.0, ans=0.2 2024-08-14 06:37:46,610 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 06:37:58,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2527320.0, ans=0.125 2024-08-14 06:38:19,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.286e+01 2.522e+01 2.892e+01 3.872e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 06:38:41,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2527620.0, ans=0.2 2024-08-14 06:38:46,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-08-14 06:38:47,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6400, loss[loss=0.1025, beats_loss=0.01024, ecapa_loss=0.0001785, whisper_loss=0.09045, over 22164.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001572, whisper_loss=0.0916, over 3931143.20 frames. ], batch size: 91, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:38:58,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2527720.0, ans=0.1 2024-08-14 06:39:18,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2527920.0, ans=0.1 2024-08-14 06:39:57,912 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 06:40:06,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6450, loss[loss=0.1147, beats_loss=0.008899, ecapa_loss=0.0001846, whisper_loss=0.1039, over 18315.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001573, whisper_loss=0.09133, over 3956958.31 frames. ], batch size: 77, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:40:24,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2528320.0, ans=0.2 2024-08-14 06:40:36,286 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.320e-01 2024-08-14 06:40:39,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-14 06:40:43,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-14 06:40:44,362 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 06:40:49,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2528420.0, ans=0.0 2024-08-14 06:40:56,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2528520.0, ans=0.0 2024-08-14 06:40:56,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.368e+01 2.657e+01 3.046e+01 7.930e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 06:40:57,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2528520.0, ans=10.0 2024-08-14 06:40:59,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2528520.0, ans=0.125 2024-08-14 06:41:06,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-14 06:41:12,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2528620.0, ans=0.0 2024-08-14 06:41:14,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2528620.0, ans=0.1 2024-08-14 06:41:24,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6500, loss[loss=0.09192, beats_loss=0.01197, ecapa_loss=0.0001515, whisper_loss=0.07844, over 21937.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001577, whisper_loss=0.09183, over 3943313.77 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:41:37,563 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 06:41:45,505 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 06:41:56,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.49 vs. limit=10.0 2024-08-14 06:41:58,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2528920.0, ans=0.125 2024-08-14 06:42:12,714 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 06:42:14,134 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 06:42:37,197 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 06:42:43,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6550, loss[loss=0.08284, beats_loss=0.01034, ecapa_loss=0.0001839, whisper_loss=0.07066, over 19014.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001575, whisper_loss=0.09248, over 3964061.17 frames. ], batch size: 81, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:42:49,381 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 27 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-14 06:42:53,204 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-14 06:42:59,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2529320.0, ans=0.125 2024-08-14 06:43:20,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2529420.0, ans=0.0 2024-08-14 06:43:20,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2529420.0, ans=0.125 2024-08-14 06:43:24,881 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 06:43:27,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2529420.0, ans=0.0 2024-08-14 06:43:35,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.448e+01 2.627e+01 2.898e+01 7.209e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-14 06:43:41,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2529520.0, ans=0.09899494936611666 2024-08-14 06:43:46,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2529520.0, ans=0.0 2024-08-14 06:43:46,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2529520.0, ans=0.05 2024-08-14 06:44:05,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6600, loss[loss=0.1179, beats_loss=0.008166, ecapa_loss=0.0001695, whisper_loss=0.1081, over 14332.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01069, ecapa_loss=0.0001577, whisper_loss=0.09234, over 3980829.51 frames. ], batch size: 56, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:44:27,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2529820.0, ans=0.125 2024-08-14 06:44:33,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-14 06:44:44,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2529920.0, ans=0.1 2024-08-14 06:44:46,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-08-14 06:44:47,942 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.800e-02 2024-08-14 06:44:56,191 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 06:44:56,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2530020.0, ans=0.125 2024-08-14 06:45:03,138 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 06:45:28,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6650, loss[loss=0.1019, beats_loss=0.01117, ecapa_loss=0.0001264, whisper_loss=0.08951, over 19763.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01076, ecapa_loss=0.0001571, whisper_loss=0.09202, over 3975580.62 frames. ], batch size: 75, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:45:51,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2530320.0, ans=0.125 2024-08-14 06:45:54,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2530320.0, ans=0.1 2024-08-14 06:46:16,340 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 06:46:20,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.583e+01 2.896e+01 3.977e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 06:46:32,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-14 06:46:37,446 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 06:46:48,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6700, loss[loss=0.1207, beats_loss=0.01008, ecapa_loss=0.0001189, whisper_loss=0.1095, over 25702.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01075, ecapa_loss=0.0001572, whisper_loss=0.09156, over 3932483.50 frames. ], batch size: 93, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:46:50,605 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 06:46:52,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2530720.0, ans=0.0 2024-08-14 06:46:53,394 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 06:46:53,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2530720.0, ans=0.07 2024-08-14 06:47:32,963 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 06:48:16,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6750, loss[loss=0.1033, beats_loss=0.01248, ecapa_loss=0.0001407, whisper_loss=0.0894, over 16744.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001573, whisper_loss=0.09142, over 3895034.98 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:49:03,099 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 06:49:08,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.298e+01 2.539e+01 2.885e+01 4.400e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 06:49:11,155 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 06:49:15,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2531520.0, ans=0.125 2024-08-14 06:49:15,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2531520.0, ans=0.125 2024-08-14 06:49:28,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2531620.0, ans=0.125 2024-08-14 06:49:39,206 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6800, loss[loss=0.1158, beats_loss=0.008694, ecapa_loss=0.0001949, whisper_loss=0.1051, over 15796.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001572, whisper_loss=0.09139, over 3856977.54 frames. ], batch size: 63, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:49:48,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2531720.0, ans=0.125 2024-08-14 06:49:52,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.83 vs. limit=22.5 2024-08-14 06:49:57,766 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 06:50:11,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2531820.0, ans=0.1 2024-08-14 06:50:13,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2531920.0, ans=0.125 2024-08-14 06:50:23,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2531920.0, ans=0.2 2024-08-14 06:50:38,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2532020.0, ans=0.2 2024-08-14 06:50:58,834 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 06:51:07,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2532120.0, ans=0.2 2024-08-14 06:51:09,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6850, loss[loss=0.1122, beats_loss=0.00864, ecapa_loss=0.000179, whisper_loss=0.1018, over 16702.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.000157, whisper_loss=0.0902, over 3851769.32 frames. ], batch size: 69, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:51:11,285 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 06:51:24,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2532220.0, ans=0.125 2024-08-14 06:51:27,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2532320.0, ans=0.0 2024-08-14 06:51:32,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2532320.0, ans=0.125 2024-08-14 06:51:35,300 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 06:51:46,484 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 06:51:59,874 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 06:52:03,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.338e+01 2.591e+01 2.972e+01 6.425e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-14 06:52:07,772 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 06:52:09,197 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 06:52:13,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2532520.0, ans=0.1 2024-08-14 06:52:40,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6900, loss[loss=0.09701, beats_loss=0.01106, ecapa_loss=0.000138, whisper_loss=0.08457, over 21681.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01087, ecapa_loss=0.0001569, whisper_loss=0.08982, over 3870882.54 frames. ], batch size: 86, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:52:44,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2532720.0, ans=0.0 2024-08-14 06:52:56,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2532720.0, ans=0.125 2024-08-14 06:52:56,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-14 06:52:59,560 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 06:53:12,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2532820.0, ans=0.2 2024-08-14 06:53:30,800 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 06:53:56,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2533020.0, ans=0.125 2024-08-14 06:54:00,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2533020.0, ans=0.1 2024-08-14 06:54:02,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2533020.0, ans=0.125 2024-08-14 06:54:07,517 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 06:54:17,374 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 06:54:17,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-14 06:54:21,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2024-08-14 06:54:30,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 6950, loss[loss=0.105, beats_loss=0.01294, ecapa_loss=0.0001295, whisper_loss=0.09081, over 16509.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01091, ecapa_loss=0.000157, whisper_loss=0.09026, over 3911104.13 frames. ], batch size: 64, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:54:31,527 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-14 06:54:50,118 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 06:55:08,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2533320.0, ans=0.125 2024-08-14 06:55:10,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2533320.0, ans=0.0 2024-08-14 06:55:38,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2024-08-14 06:55:41,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.379e+01 2.570e+01 2.940e+01 3.906e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 06:56:12,513 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 06:56:20,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7000, loss[loss=0.0994, beats_loss=0.01283, ecapa_loss=0.000148, whisper_loss=0.08509, over 21035.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01084, ecapa_loss=0.0001578, whisper_loss=0.09042, over 3915079.88 frames. ], batch size: 87, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:56:22,109 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 06:56:28,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2533720.0, ans=0.125 2024-08-14 06:56:35,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2533720.0, ans=0.07 2024-08-14 06:56:54,708 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 06:57:17,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2533920.0, ans=0.0 2024-08-14 06:57:47,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2534120.0, ans=0.0 2024-08-14 06:57:58,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2534120.0, ans=0.0 2024-08-14 06:57:58,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2534120.0, ans=0.05 2024-08-14 06:58:01,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7050, loss[loss=0.1021, beats_loss=0.009758, ecapa_loss=0.0001428, whisper_loss=0.09095, over 14745.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001576, whisper_loss=0.09104, over 3901017.08 frames. ], batch size: 57, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:58:27,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-14 06:58:35,692 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 06:58:48,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.340e+01 2.637e+01 3.082e+01 1.011e+02, threshold=5.275e+01, percent-clipped=1.0 2024-08-14 06:58:48,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2534520.0, ans=0.125 2024-08-14 06:58:51,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2534520.0, ans=0.05 2024-08-14 06:58:53,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-14 06:58:54,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2534520.0, ans=0.0 2024-08-14 06:58:56,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2534520.0, ans=0.125 2024-08-14 06:58:57,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2534520.0, ans=0.125 2024-08-14 06:59:01,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2534620.0, ans=0.2 2024-08-14 06:59:09,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2534620.0, ans=0.2 2024-08-14 06:59:13,999 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7100, loss[loss=0.1119, beats_loss=0.0111, ecapa_loss=0.0001568, whisper_loss=0.09924, over 22561.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01083, ecapa_loss=0.0001564, whisper_loss=0.09086, over 3890113.34 frames. ], batch size: 90, lr: 3.48e-03, grad_scale: 5.764607523034235e+17 2024-08-14 06:59:46,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2534920.0, ans=0.125 2024-08-14 06:59:59,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2535020.0, ans=0.0 2024-08-14 07:00:31,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7150, loss[loss=0.108, beats_loss=0.01219, ecapa_loss=0.0001401, whisper_loss=0.0944, over 22985.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001558, whisper_loss=0.09086, over 3881411.02 frames. ], batch size: 94, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:00:35,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-14 07:00:37,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2535220.0, ans=0.125 2024-08-14 07:00:39,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-08-14 07:00:41,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2535220.0, ans=0.0 2024-08-14 07:00:43,776 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 07:01:03,946 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 07:01:20,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.290e+01 2.562e+01 2.920e+01 7.577e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 07:01:20,301 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 07:01:21,749 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 07:01:31,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2535620.0, ans=0.125 2024-08-14 07:01:32,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 07:01:38,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2535620.0, ans=0.125 2024-08-14 07:01:39,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2535620.0, ans=0.125 2024-08-14 07:01:42,819 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 07:01:47,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7200, loss[loss=0.1144, beats_loss=0.009786, ecapa_loss=0.0001478, whisper_loss=0.1032, over 23531.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0108, ecapa_loss=0.0001559, whisper_loss=0.0912, over 3874647.51 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:01:56,070 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 29 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 07:01:59,132 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 07:02:05,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2535820.0, ans=0.125 2024-08-14 07:02:09,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2535820.0, ans=0.125 2024-08-14 07:02:20,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 07:02:31,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2535920.0, ans=0.2 2024-08-14 07:03:00,597 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 07:03:02,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2536120.0, ans=0.125 2024-08-14 07:03:03,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2536220.0, ans=0.0 2024-08-14 07:03:04,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7250, loss[loss=0.1017, beats_loss=0.01241, ecapa_loss=0.000183, whisper_loss=0.0875, over 21635.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.000156, whisper_loss=0.09175, over 3886427.66 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:03:22,124 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 07:03:35,883 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 07:03:40,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2536420.0, ans=0.1 2024-08-14 07:03:51,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2536520.0, ans=0.125 2024-08-14 07:03:55,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.431e+01 2.606e+01 2.894e+01 4.565e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-14 07:04:03,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2536520.0, ans=0.125 2024-08-14 07:04:22,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7300, loss[loss=0.102, beats_loss=0.01111, ecapa_loss=0.0001559, whisper_loss=0.08938, over 20943.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01078, ecapa_loss=0.0001562, whisper_loss=0.09128, over 3887420.26 frames. ], batch size: 87, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:04:25,926 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 07:04:33,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-14 07:04:46,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2536820.0, ans=0.0 2024-08-14 07:04:57,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2536920.0, ans=0.125 2024-08-14 07:05:06,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2536920.0, ans=0.1 2024-08-14 07:05:16,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2537020.0, ans=0.2 2024-08-14 07:05:20,922 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-14 07:05:22,989 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.576e+00 2024-08-14 07:05:31,463 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 07:05:31,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2537120.0, ans=0.2 2024-08-14 07:05:38,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7350, loss[loss=0.1014, beats_loss=0.01089, ecapa_loss=0.0001651, whisper_loss=0.08888, over 21846.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001577, whisper_loss=0.09088, over 3891202.65 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:06:03,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2537320.0, ans=0.05 2024-08-14 07:06:06,240 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 07:06:12,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-14 07:06:26,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.465e+01 2.701e+01 2.899e+01 2.044e+02, threshold=5.402e+01, percent-clipped=2.0 2024-08-14 07:06:37,902 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 07:06:44,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2537620.0, ans=0.1 2024-08-14 07:06:54,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7400, loss[loss=0.1176, beats_loss=0.009279, ecapa_loss=0.0001696, whisper_loss=0.1066, over 18636.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001584, whisper_loss=0.09092, over 3875759.40 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:06:54,747 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 07:07:08,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-14 07:07:15,677 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-14 07:07:19,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2537820.0, ans=0.1 2024-08-14 07:07:20,640 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 07:07:23,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2537820.0, ans=0.0 2024-08-14 07:07:34,747 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 07:07:39,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2537920.0, ans=0.125 2024-08-14 07:07:54,258 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 07:08:12,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7450, loss[loss=0.08636, beats_loss=0.0142, ecapa_loss=0.0001367, whisper_loss=0.07079, over 21699.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001585, whisper_loss=0.09137, over 3901017.00 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:08:14,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2538220.0, ans=0.125 2024-08-14 07:08:21,331 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 12 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 07:08:27,662 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 07:08:59,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2538420.0, ans=0.125 2024-08-14 07:09:04,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-14 07:09:05,385 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.665e+01 3.000e+01 5.031e+01, threshold=5.329e+01, percent-clipped=0.0 2024-08-14 07:09:16,869 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 07:09:20,155 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:09:33,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7500, loss[loss=0.1054, beats_loss=0.01076, ecapa_loss=0.0001396, whisper_loss=0.0932, over 23901.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001577, whisper_loss=0.09202, over 3919154.53 frames. ], batch size: 93, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:09:41,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2538720.0, ans=0.2 2024-08-14 07:10:03,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2538820.0, ans=0.0 2024-08-14 07:10:03,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2538820.0, ans=0.2 2024-08-14 07:10:10,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2024-08-14 07:10:42,039 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 07:10:49,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-14 07:10:51,537 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 07:10:54,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7550, loss[loss=0.1314, beats_loss=0.01048, ecapa_loss=0.0001648, whisper_loss=0.1193, over 24087.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001571, whisper_loss=0.09107, over 3888136.44 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:10:57,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-14 07:10:57,731 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 34 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 07:11:15,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2024-08-14 07:11:29,595 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 07:11:46,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.298e+01 2.593e+01 2.946e+01 4.435e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 07:11:50,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2539520.0, ans=0.1 2024-08-14 07:12:07,516 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 07:12:10,230 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 07:12:15,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7600, loss[loss=0.1179, beats_loss=0.008508, ecapa_loss=0.0001643, whisper_loss=0.1077, over 24102.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001577, whisper_loss=0.09136, over 3889169.87 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:12:17,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2024-08-14 07:12:33,667 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 07:12:38,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-14 07:12:43,075 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 40 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 07:13:27,376 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 07:13:33,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7650, loss[loss=0.1186, beats_loss=0.009474, ecapa_loss=0.0001767, whisper_loss=0.1074, over 20604.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001578, whisper_loss=0.09149, over 3874526.20 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:14:10,750 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 07:14:25,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.299e+01 2.494e+01 2.918e+01 5.997e+01, threshold=4.989e+01, percent-clipped=1.0 2024-08-14 07:14:26,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2540520.0, ans=0.125 2024-08-14 07:14:28,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2540520.0, ans=0.125 2024-08-14 07:14:35,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2540520.0, ans=0.1 2024-08-14 07:14:45,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2540620.0, ans=0.2 2024-08-14 07:14:53,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.94 vs. limit=8.0 2024-08-14 07:14:53,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7700, loss[loss=0.07721, beats_loss=0.01301, ecapa_loss=0.000177, whisper_loss=0.06244, over 20382.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.0905, over 3894225.92 frames. ], batch size: 90, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:14:53,866 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 07:14:55,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-14 07:15:05,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2540720.0, ans=0.0 2024-08-14 07:15:10,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2540820.0, ans=0.1 2024-08-14 07:15:15,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2540820.0, ans=0.1 2024-08-14 07:15:16,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2540820.0, ans=0.95 2024-08-14 07:15:36,196 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 07:15:43,256 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 07:16:01,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-08-14 07:16:01,487 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 07:16:13,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7750, loss[loss=0.1124, beats_loss=0.01048, ecapa_loss=0.0001699, whisper_loss=0.1002, over 21649.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001575, whisper_loss=0.09087, over 3891189.80 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:16:24,031 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:16:46,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2541420.0, ans=0.0 2024-08-14 07:17:04,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.379e+01 2.592e+01 2.812e+01 4.047e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-14 07:17:07,652 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-14 07:17:12,286 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:17:14,882 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 07:17:20,324 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 07:17:30,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7800, loss[loss=0.09956, beats_loss=0.01023, ecapa_loss=0.0001753, whisper_loss=0.08758, over 21014.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001568, whisper_loss=0.09054, over 3879611.54 frames. ], batch size: 88, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:17:33,215 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 07:17:40,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2541720.0, ans=0.1 2024-08-14 07:17:51,610 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 07:18:01,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2541920.0, ans=0.125 2024-08-14 07:18:12,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2541920.0, ans=0.0 2024-08-14 07:18:12,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2541920.0, ans=0.125 2024-08-14 07:18:20,866 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 07:18:28,185 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 07:18:38,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2542120.0, ans=0.125 2024-08-14 07:18:41,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2542120.0, ans=0.125 2024-08-14 07:18:43,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7850, loss[loss=0.1138, beats_loss=0.01025, ecapa_loss=0.000201, whisper_loss=0.1016, over 19848.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01085, ecapa_loss=0.0001584, whisper_loss=0.09, over 3884433.33 frames. ], batch size: 83, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:18:49,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2542220.0, ans=0.035 2024-08-14 07:18:52,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2542220.0, ans=0.1 2024-08-14 07:18:58,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2542320.0, ans=0.125 2024-08-14 07:19:01,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2542320.0, ans=0.2 2024-08-14 07:19:27,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2542520.0, ans=0.0 2024-08-14 07:19:29,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.387e+01 2.602e+01 2.902e+01 1.105e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 07:19:30,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=15.0 2024-08-14 07:19:30,801 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 07:19:35,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2542520.0, ans=0.2 2024-08-14 07:19:42,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2542620.0, ans=0.125 2024-08-14 07:19:54,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7900, loss[loss=0.09848, beats_loss=0.01115, ecapa_loss=0.0001493, whisper_loss=0.08584, over 22196.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0109, ecapa_loss=0.0001577, whisper_loss=0.09054, over 3890558.34 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:19:57,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2542720.0, ans=0.0 2024-08-14 07:20:20,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2542820.0, ans=0.125 2024-08-14 07:20:27,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2024-08-14 07:20:28,855 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04889056831598282, model_norm_threshold=52.03104019165039 2024-08-14 07:20:29,029 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.668e+05, grad_sumsq=1.668e+05, orig_rms_sq=1.000e+00 2024-08-14 07:20:32,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2542920.0, ans=0.125 2024-08-14 07:20:35,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2542920.0, ans=0.0 2024-08-14 07:20:50,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2543020.0, ans=0.125 2024-08-14 07:21:03,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2543120.0, ans=0.125 2024-08-14 07:21:06,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 7950, loss[loss=0.09407, beats_loss=0.01166, ecapa_loss=0.0001075, whisper_loss=0.08134, over 16108.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01091, ecapa_loss=0.0001574, whisper_loss=0.0906, over 3905937.96 frames. ], batch size: 62, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:21:08,256 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 07:21:38,476 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.548e-03 2024-08-14 07:21:47,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2543420.0, ans=0.0 2024-08-14 07:21:54,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.373e+01 2.668e+01 3.252e+01 1.064e+03, threshold=5.336e+01, percent-clipped=2.0 2024-08-14 07:22:04,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2543620.0, ans=0.125 2024-08-14 07:22:19,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8000, loss[loss=0.09093, beats_loss=0.01022, ecapa_loss=0.0001883, whisper_loss=0.07883, over 21088.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001572, whisper_loss=0.09097, over 3899675.15 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:22:20,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2543720.0, ans=0.07 2024-08-14 07:22:30,518 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-14 07:22:32,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2543720.0, ans=0.125 2024-08-14 07:22:33,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2543820.0, ans=0.1 2024-08-14 07:22:49,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2543920.0, ans=0.2 2024-08-14 07:23:00,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2543920.0, ans=0.125 2024-08-14 07:23:00,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2024-08-14 07:23:07,393 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 07:23:10,511 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 07:23:13,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2544020.0, ans=0.2 2024-08-14 07:23:15,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-14 07:23:16,335 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 07:23:31,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-14 07:23:33,337 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8050, loss[loss=0.09517, beats_loss=0.01275, ecapa_loss=0.000168, whisper_loss=0.08074, over 21711.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01087, ecapa_loss=0.0001573, whisper_loss=0.09008, over 3869110.51 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:23:36,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2544220.0, ans=0.0 2024-08-14 07:23:40,830 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 07:23:41,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2544220.0, ans=0.1 2024-08-14 07:23:57,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-08-14 07:24:01,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2544420.0, ans=0.125 2024-08-14 07:24:07,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2024-08-14 07:24:10,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-08-14 07:24:20,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.420e+01 2.579e+01 3.062e+01 1.369e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-14 07:24:44,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2544720.0, ans=0.0 2024-08-14 07:24:45,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8100, loss[loss=0.09135, beats_loss=0.01191, ecapa_loss=0.0001668, whisper_loss=0.07777, over 21114.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001582, whisper_loss=0.09022, over 3883053.67 frames. ], batch size: 88, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:24:54,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-14 07:25:02,726 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 07:25:06,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2024-08-14 07:25:55,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8150, loss[loss=0.1238, beats_loss=0.007731, ecapa_loss=0.0001582, whisper_loss=0.1145, over 23123.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.0001572, whisper_loss=0.09015, over 3891505.25 frames. ], batch size: 91, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:25:57,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2545220.0, ans=0.1 2024-08-14 07:26:04,634 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 07:26:07,177 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 28 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-14 07:26:16,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2545320.0, ans=0.2 2024-08-14 07:26:17,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2024-08-14 07:26:21,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2545320.0, ans=0.2 2024-08-14 07:26:24,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2545420.0, ans=0.125 2024-08-14 07:26:29,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-14 07:26:30,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-14 07:26:41,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.412e+01 2.672e+01 3.051e+01 4.273e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 07:27:01,125 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 07:27:06,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8200, loss[loss=0.09528, beats_loss=0.01229, ecapa_loss=0.0001586, whisper_loss=0.0814, over 21976.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001589, whisper_loss=0.09015, over 3903416.17 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:27:16,659 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 07:27:43,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2545920.0, ans=0.125 2024-08-14 07:28:06,729 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 07:28:10,904 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 07:28:18,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8250, loss[loss=0.1219, beats_loss=0.007312, ecapa_loss=0.0002181, whisper_loss=0.1124, over 17683.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001585, whisper_loss=0.08983, over 3887561.92 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:28:25,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2546220.0, ans=0.1 2024-08-14 07:28:28,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2546220.0, ans=0.125 2024-08-14 07:28:29,456 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 07:28:50,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2546420.0, ans=0.125 2024-08-14 07:29:03,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.321e+01 2.631e+01 2.915e+01 1.588e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-14 07:29:08,720 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 07:29:32,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8300, loss[loss=0.1163, beats_loss=0.009287, ecapa_loss=0.0001845, whisper_loss=0.1051, over 23024.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001585, whisper_loss=0.08993, over 3870385.26 frames. ], batch size: 95, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:29:42,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-08-14 07:30:03,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2546920.0, ans=0.2 2024-08-14 07:30:08,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2546920.0, ans=0.0 2024-08-14 07:30:10,749 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 07:30:17,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=12.0 2024-08-14 07:30:48,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8350, loss[loss=0.09692, beats_loss=0.01327, ecapa_loss=0.0001306, whisper_loss=0.08235, over 16370.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001588, whisper_loss=0.09001, over 3883202.93 frames. ], batch size: 63, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:30:49,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2547220.0, ans=0.125 2024-08-14 07:30:50,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2547220.0, ans=0.0 2024-08-14 07:31:13,886 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 07:31:25,170 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 07:31:35,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.329e+01 2.543e+01 2.806e+01 3.860e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-14 07:31:56,283 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 07:31:56,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2547620.0, ans=0.5 2024-08-14 07:32:01,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8400, loss[loss=0.0978, beats_loss=0.01173, ecapa_loss=0.0001419, whisper_loss=0.08465, over 21202.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001595, whisper_loss=0.09054, over 3914477.15 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:32:04,943 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 07:32:08,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-14 07:32:15,117 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 07:32:26,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2547820.0, ans=0.0 2024-08-14 07:32:27,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2547820.0, ans=0.07 2024-08-14 07:32:30,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2547920.0, ans=0.0 2024-08-14 07:32:33,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2547920.0, ans=0.2 2024-08-14 07:32:43,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2548020.0, ans=0.04949747468305833 2024-08-14 07:32:59,684 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 07:33:02,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2548120.0, ans=0.0 2024-08-14 07:33:11,451 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-14 07:33:13,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8450, loss[loss=0.09029, beats_loss=0.01114, ecapa_loss=0.0001195, whisper_loss=0.07796, over 17026.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001592, whisper_loss=0.09064, over 3897952.85 frames. ], batch size: 65, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:33:14,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2548220.0, ans=0.125 2024-08-14 07:33:40,130 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 07:33:59,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.373e+01 2.582e+01 3.046e+01 4.610e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 07:34:06,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2548520.0, ans=0.0 2024-08-14 07:34:13,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2548620.0, ans=0.1 2024-08-14 07:34:14,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2548620.0, ans=0.0 2024-08-14 07:34:25,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8500, loss[loss=0.1207, beats_loss=0.01018, ecapa_loss=0.0001527, whisper_loss=0.109, over 21604.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.000158, whisper_loss=0.09153, over 3924060.98 frames. ], batch size: 83, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:34:30,392 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 07:35:18,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2549020.0, ans=0.0 2024-08-14 07:35:25,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2549120.0, ans=0.2 2024-08-14 07:35:26,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2549120.0, ans=0.1 2024-08-14 07:35:36,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8550, loss[loss=0.1299, beats_loss=0.008022, ecapa_loss=0.0002011, whisper_loss=0.1199, over 14102.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001578, whisper_loss=0.09159, over 3912672.32 frames. ], batch size: 55, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:35:39,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2549220.0, ans=0.125 2024-08-14 07:35:43,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2549220.0, ans=0.0 2024-08-14 07:35:44,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2549220.0, ans=0.2 2024-08-14 07:35:59,464 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 07:36:13,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2549420.0, ans=0.2 2024-08-14 07:36:22,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.413e+01 2.608e+01 2.932e+01 1.178e+02, threshold=5.217e+01, percent-clipped=2.0 2024-08-14 07:36:23,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2549520.0, ans=0.0 2024-08-14 07:36:27,401 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 07:36:41,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2549620.0, ans=0.0 2024-08-14 07:36:50,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8600, loss[loss=0.09807, beats_loss=0.01009, ecapa_loss=0.0001682, whisper_loss=0.08629, over 20186.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001577, whisper_loss=0.09178, over 3865652.22 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:36:53,318 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 07:37:18,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2549820.0, ans=0.125 2024-08-14 07:37:20,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2549820.0, ans=0.0 2024-08-14 07:37:36,148 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 07:37:45,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2550020.0, ans=0.2 2024-08-14 07:37:48,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2550020.0, ans=0.0 2024-08-14 07:37:54,169 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 07:38:06,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2550120.0, ans=0.125 2024-08-14 07:38:09,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8650, loss[loss=0.1062, beats_loss=0.009437, ecapa_loss=0.0001949, whisper_loss=0.09481, over 19965.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001574, whisper_loss=0.09117, over 3862571.28 frames. ], batch size: 84, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:38:27,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2550320.0, ans=0.1 2024-08-14 07:38:35,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2550320.0, ans=0.125 2024-08-14 07:38:38,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2550420.0, ans=0.0 2024-08-14 07:38:41,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-14 07:38:48,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-14 07:38:54,928 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 07:38:56,356 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.324e+01 2.530e+01 2.821e+01 3.799e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 07:39:09,988 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 07:39:13,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-08-14 07:39:17,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2550620.0, ans=0.125 2024-08-14 07:39:20,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8700, loss[loss=0.09317, beats_loss=0.009921, ecapa_loss=0.0001645, whisper_loss=0.0816, over 17819.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.000158, whisper_loss=0.09095, over 3873006.41 frames. ], batch size: 71, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:39:21,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2550720.0, ans=0.0 2024-08-14 07:39:26,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2550720.0, ans=0.04949747468305833 2024-08-14 07:39:28,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2550720.0, ans=0.0 2024-08-14 07:39:49,514 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-14 07:39:59,310 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 07:40:15,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.14 vs. limit=22.5 2024-08-14 07:40:17,811 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 07:40:31,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8750, loss[loss=0.1077, beats_loss=0.00899, ecapa_loss=0.0001255, whisper_loss=0.09743, over 15666.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001575, whisper_loss=0.09107, over 3861540.70 frames. ], batch size: 57, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:40:39,277 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 07:40:43,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2551220.0, ans=0.0 2024-08-14 07:41:16,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.282e+01 2.583e+01 2.856e+01 3.464e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 07:41:28,350 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 07:41:39,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2551620.0, ans=0.0 2024-08-14 07:41:42,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8800, loss[loss=0.1136, beats_loss=0.009781, ecapa_loss=0.0001547, whisper_loss=0.1022, over 15801.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001561, whisper_loss=0.09147, over 3921983.89 frames. ], batch size: 64, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:42:01,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2551820.0, ans=0.1 2024-08-14 07:42:02,711 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 07:42:16,727 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 07:42:18,289 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 07:42:43,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2552120.0, ans=0.125 2024-08-14 07:42:46,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2552120.0, ans=0.04949747468305833 2024-08-14 07:42:49,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-14 07:42:54,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8850, loss[loss=0.1243, beats_loss=0.01062, ecapa_loss=0.0001505, whisper_loss=0.1122, over 23106.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001553, whisper_loss=0.092, over 3939279.25 frames. ], batch size: 91, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:42:56,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2552220.0, ans=0.1 2024-08-14 07:43:07,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2552320.0, ans=0.125 2024-08-14 07:43:39,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.372e+01 2.652e+01 3.112e+01 4.829e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-14 07:43:40,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2024-08-14 07:43:57,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2024-08-14 07:44:05,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8900, loss[loss=0.1153, beats_loss=0.008398, ecapa_loss=0.0001662, whisper_loss=0.1052, over 23946.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001557, whisper_loss=0.09183, over 3957475.79 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 1.152921504606847e+18 2024-08-14 07:44:05,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2552720.0, ans=0.125 2024-08-14 07:44:17,656 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 07:44:27,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2552820.0, ans=0.1 2024-08-14 07:44:30,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2552820.0, ans=0.1 2024-08-14 07:44:41,022 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 07:44:47,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-14 07:44:53,827 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.217e-01 2024-08-14 07:44:59,297 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 07:45:10,411 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 07:45:10,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2553120.0, ans=0.125 2024-08-14 07:45:12,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2553120.0, ans=0.0 2024-08-14 07:45:16,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 8950, loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001327, whisper_loss=0.08916, over 16005.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.000156, whisper_loss=0.09185, over 3905746.20 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:45:24,386 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 07:45:27,082 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 07:45:47,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2553420.0, ans=0.125 2024-08-14 07:46:03,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.450e+01 2.761e+01 3.148e+01 4.518e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-14 07:46:05,485 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 07:46:15,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2553620.0, ans=0.125 2024-08-14 07:46:27,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9000, loss[loss=0.08474, beats_loss=0.01003, ecapa_loss=0.0001165, whisper_loss=0.07355, over 16495.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001559, whisper_loss=0.09149, over 3906301.78 frames. ], batch size: 57, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:46:27,836 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 07:47:08,598 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005502, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 07:47:28,552 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on SV_voxceleb1: loss=0.004391, beats_loss=0, ecapa_loss=0.0004391, whisper_loss=0, over 939242.00 frames. 2024-08-14 07:47:42,373 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9230, 3.1791, 3.6338, 3.6326], device='cuda:3') 2024-08-14 07:49:28,281 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on AT_audioset: loss=0.02358, beats_loss=0.02358, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 07:49:28,285 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 07:49:28,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2553720.0, ans=0.0 2024-08-14 07:49:36,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-08-14 07:49:37,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2553720.0, ans=0.025 2024-08-14 07:49:46,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2553820.0, ans=0.125 2024-08-14 07:49:48,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2553820.0, ans=0.1 2024-08-14 07:49:55,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2553920.0, ans=0.2 2024-08-14 07:50:05,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2553920.0, ans=0.125 2024-08-14 07:50:05,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2553920.0, ans=0.125 2024-08-14 07:50:30,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-08-14 07:50:38,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9050, loss[loss=0.1153, beats_loss=0.01104, ecapa_loss=0.0001464, whisper_loss=0.1028, over 22850.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01065, ecapa_loss=0.0001566, whisper_loss=0.09136, over 3894314.18 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:50:43,531 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 07:50:46,274 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 07:51:10,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-08-14 07:51:27,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.420e+01 2.680e+01 3.001e+01 5.357e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-14 07:51:34,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2554520.0, ans=0.1 2024-08-14 07:51:36,126 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 07:51:36,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2554520.0, ans=0.04949747468305833 2024-08-14 07:51:39,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-14 07:51:41,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2554620.0, ans=0.125 2024-08-14 07:51:55,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9100, loss[loss=0.1086, beats_loss=0.009756, ecapa_loss=0.0001806, whisper_loss=0.09705, over 16058.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01055, ecapa_loss=0.0001579, whisper_loss=0.09222, over 3885158.69 frames. ], batch size: 68, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:52:02,136 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 07:52:11,623 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 07:52:34,270 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:52:35,278 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 07:52:37,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2554920.0, ans=0.035 2024-08-14 07:52:46,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-14 07:52:55,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2555020.0, ans=0.125 2024-08-14 07:53:06,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2555120.0, ans=0.035 2024-08-14 07:53:07,847 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 07:53:11,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9150, loss[loss=0.09828, beats_loss=0.006944, ecapa_loss=0.0001751, whisper_loss=0.08959, over 17839.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0106, ecapa_loss=0.0001574, whisper_loss=0.09233, over 3899037.82 frames. ], batch size: 70, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:53:16,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2555220.0, ans=0.0 2024-08-14 07:53:25,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2024-08-14 07:53:35,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2555320.0, ans=0.1 2024-08-14 07:53:44,884 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 07:53:48,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2555420.0, ans=0.125 2024-08-14 07:53:53,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-14 07:53:57,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.410e+01 2.700e+01 3.056e+01 6.075e+01, threshold=5.399e+01, percent-clipped=3.0 2024-08-14 07:54:02,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2555520.0, ans=0.125 2024-08-14 07:54:08,579 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:54:15,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 07:54:16,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2555620.0, ans=0.0 2024-08-14 07:54:20,761 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 07:54:21,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9200, loss[loss=0.1224, beats_loss=0.009569, ecapa_loss=0.0001655, whisper_loss=0.1112, over 22040.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01057, ecapa_loss=0.0001581, whisper_loss=0.09194, over 3924633.45 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:54:54,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-08-14 07:55:04,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2556020.0, ans=0.125 2024-08-14 07:55:09,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2556020.0, ans=0.125 2024-08-14 07:55:15,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2556020.0, ans=0.2 2024-08-14 07:55:17,695 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-14 07:55:22,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2556120.0, ans=0.125 2024-08-14 07:55:30,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2556120.0, ans=0.125 2024-08-14 07:55:33,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9250, loss[loss=0.1083, beats_loss=0.01011, ecapa_loss=0.0001686, whisper_loss=0.09647, over 16896.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001592, whisper_loss=0.09131, over 3890080.97 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:55:47,917 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 17 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-14 07:55:49,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2556320.0, ans=0.125 2024-08-14 07:55:55,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-14 07:56:07,577 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 07:56:15,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2556520.0, ans=12.0 2024-08-14 07:56:20,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.326e+01 2.739e+01 3.130e+01 4.617e+01, threshold=5.478e+01, percent-clipped=0.0 2024-08-14 07:56:21,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2024-08-14 07:56:32,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-14 07:56:35,964 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 07:56:41,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2556620.0, ans=0.0 2024-08-14 07:56:42,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-14 07:56:43,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9300, loss[loss=0.1148, beats_loss=0.009975, ecapa_loss=0.0001737, whisper_loss=0.1031, over 23374.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001578, whisper_loss=0.09128, over 3901175.12 frames. ], batch size: 93, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:57:12,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2556920.0, ans=0.125 2024-08-14 07:57:33,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-14 07:57:36,332 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 07:57:38,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.16 vs. limit=22.5 2024-08-14 07:57:43,349 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 07:57:56,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9350, loss[loss=0.07416, beats_loss=0.01221, ecapa_loss=0.0002012, whisper_loss=0.05993, over 19164.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001575, whisper_loss=0.09086, over 3892193.80 frames. ], batch size: 82, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:58:08,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=12.0 2024-08-14 07:58:11,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2557320.0, ans=0.0 2024-08-14 07:58:20,154 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 07:58:20,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-14 07:58:42,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.320e+01 2.600e+01 2.954e+01 6.976e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-14 07:58:49,920 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 07:58:53,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2557620.0, ans=0.125 2024-08-14 07:58:58,507 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 07:59:01,324 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 07:59:05,850 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 07:59:06,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9400, loss[loss=0.104, beats_loss=0.01025, ecapa_loss=0.0001737, whisper_loss=0.09202, over 15112.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001573, whisper_loss=0.09051, over 3866746.73 frames. ], batch size: 60, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 07:59:13,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-14 07:59:22,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2557820.0, ans=15.0 2024-08-14 07:59:24,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.92 vs. limit=22.5 2024-08-14 07:59:25,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-08-14 07:59:44,986 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 07:59:50,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2558020.0, ans=0.0 2024-08-14 08:00:09,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2558120.0, ans=0.125 2024-08-14 08:00:17,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9450, loss[loss=0.1001, beats_loss=0.01242, ecapa_loss=0.000143, whisper_loss=0.08624, over 22702.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001568, whisper_loss=0.09073, over 3874461.44 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:00:17,976 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 08:00:23,439 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 08:00:30,507 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 08:00:40,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2558320.0, ans=0.0 2024-08-14 08:01:04,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.392e+01 2.683e+01 2.998e+01 2.159e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-14 08:01:14,769 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 08:01:21,733 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 08:01:28,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9500, loss[loss=0.1099, beats_loss=0.009944, ecapa_loss=0.0001779, whisper_loss=0.09813, over 23658.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001569, whisper_loss=0.0903, over 3878306.22 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:01:38,534 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 08:01:49,022 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 16 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-14 08:01:53,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2558820.0, ans=0.125 2024-08-14 08:01:58,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-08-14 08:01:59,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2558920.0, ans=0.125 2024-08-14 08:02:05,002 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.870e-01 2024-08-14 08:02:11,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2559020.0, ans=0.2 2024-08-14 08:02:18,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2559020.0, ans=0.125 2024-08-14 08:02:18,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2559020.0, ans=0.1 2024-08-14 08:02:33,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-14 08:02:39,856 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9550, loss[loss=0.1289, beats_loss=0.007468, ecapa_loss=0.0001941, whisper_loss=0.1195, over 15969.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001576, whisper_loss=0.09052, over 3884691.45 frames. ], batch size: 64, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:02:54,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2559320.0, ans=0.0 2024-08-14 08:03:12,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2024-08-14 08:03:26,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.380e+01 2.664e+01 3.149e+01 1.810e+02, threshold=5.328e+01, percent-clipped=2.0 2024-08-14 08:03:27,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2559520.0, ans=0.0 2024-08-14 08:03:32,417 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 08:03:50,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9600, loss[loss=0.08139, beats_loss=0.00892, ecapa_loss=0.000204, whisper_loss=0.07043, over 14287.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001576, whisper_loss=0.09168, over 3889457.89 frames. ], batch size: 59, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:03:51,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2559720.0, ans=0.2 2024-08-14 08:04:01,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2559720.0, ans=0.125 2024-08-14 08:04:35,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2559920.0, ans=0.125 2024-08-14 08:04:58,083 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 08:05:01,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2560120.0, ans=0.0 2024-08-14 08:05:01,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2560120.0, ans=0.125 2024-08-14 08:05:03,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2560120.0, ans=0.125 2024-08-14 08:05:06,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9650, loss[loss=0.09858, beats_loss=0.01006, ecapa_loss=0.0001625, whisper_loss=0.08689, over 20112.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01056, ecapa_loss=0.0001578, whisper_loss=0.09208, over 3880044.55 frames. ], batch size: 83, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:05:10,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2560220.0, ans=0.1 2024-08-14 08:05:40,168 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-14 08:05:50,337 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 08:05:52,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.378e+01 2.605e+01 3.057e+01 7.649e+01, threshold=5.209e+01, percent-clipped=3.0 2024-08-14 08:06:06,014 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.417e+01 2024-08-14 08:06:08,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2560620.0, ans=0.125 2024-08-14 08:06:11,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2024-08-14 08:06:16,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9700, loss[loss=0.09007, beats_loss=0.01383, ecapa_loss=0.000149, whisper_loss=0.07475, over 21657.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001582, whisper_loss=0.0913, over 3858647.15 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:06:16,991 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-14 08:06:18,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2560720.0, ans=0.125 2024-08-14 08:06:35,426 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 08:07:20,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2561120.0, ans=0.0 2024-08-14 08:07:23,223 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 08:07:28,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9750, loss[loss=0.08398, beats_loss=0.009802, ecapa_loss=0.0001812, whisper_loss=0.07236, over 13658.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001597, whisper_loss=0.09155, over 3827775.99 frames. ], batch size: 53, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:08:16,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.202e+01 2.404e+01 2.626e+01 3.852e+01, threshold=4.808e+01, percent-clipped=0.0 2024-08-14 08:08:32,322 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 08:08:40,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9800, loss[loss=0.08404, beats_loss=0.01364, ecapa_loss=0.0001396, whisper_loss=0.069, over 18166.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001577, whisper_loss=0.09039, over 3786170.64 frames. ], batch size: 77, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:08:43,545 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 08:09:05,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2561820.0, ans=0.0 2024-08-14 08:09:10,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2561920.0, ans=0.1 2024-08-14 08:09:10,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2561920.0, ans=0.0 2024-08-14 08:09:11,886 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 08:09:16,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2561920.0, ans=0.1 2024-08-14 08:09:22,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2562020.0, ans=0.0 2024-08-14 08:09:26,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2562020.0, ans=0.125 2024-08-14 08:09:26,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-08-14 08:09:47,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=12.0 2024-08-14 08:09:49,721 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.464e-02 2024-08-14 08:09:50,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9850, loss[loss=0.1053, beats_loss=0.01228, ecapa_loss=0.0001709, whisper_loss=0.09135, over 21076.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.0001564, whisper_loss=0.09, over 3793553.22 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:10:05,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2562320.0, ans=0.0 2024-08-14 08:10:06,139 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-14 08:10:17,589 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 08:10:33,205 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.080e+01 2024-08-14 08:10:36,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.409e+01 2.672e+01 2.970e+01 5.427e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 08:10:44,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2562520.0, ans=0.125 2024-08-14 08:10:52,131 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 23 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-14 08:11:00,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9900, loss[loss=0.09505, beats_loss=0.01221, ecapa_loss=0.0001692, whisper_loss=0.08115, over 13254.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01089, ecapa_loss=0.0001563, whisper_loss=0.08924, over 3844503.57 frames. ], batch size: 55, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:11:08,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2562720.0, ans=0.2 2024-08-14 08:11:29,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2562920.0, ans=0.05 2024-08-14 08:11:35,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2562920.0, ans=0.125 2024-08-14 08:11:35,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2562920.0, ans=0.125 2024-08-14 08:11:54,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2563020.0, ans=0.1 2024-08-14 08:12:03,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2563120.0, ans=0.1 2024-08-14 08:12:11,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 9950, loss[loss=0.1104, beats_loss=0.01043, ecapa_loss=0.000201, whisper_loss=0.09795, over 21257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01086, ecapa_loss=0.0001583, whisper_loss=0.08977, over 3853887.05 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:12:47,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2563420.0, ans=0.125 2024-08-14 08:12:47,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2563420.0, ans=0.2 2024-08-14 08:12:52,988 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 08:12:58,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.310e+01 2.516e+01 2.952e+01 4.420e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-14 08:13:07,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2563620.0, ans=0.1 2024-08-14 08:13:18,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2563620.0, ans=0.5 2024-08-14 08:13:22,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10000, loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001743, whisper_loss=0.08874, over 18992.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001591, whisper_loss=0.09034, over 3853423.12 frames. ], batch size: 79, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:13:38,606 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 08:13:40,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2563820.0, ans=0.0 2024-08-14 08:13:44,115 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.548e-03 2024-08-14 08:13:57,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-08-14 08:14:28,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=22.5 2024-08-14 08:14:32,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564220.0, ans=0.1 2024-08-14 08:14:33,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10050, loss[loss=0.1121, beats_loss=0.009948, ecapa_loss=0.0001607, whisper_loss=0.1005, over 22263.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001583, whisper_loss=0.09119, over 3874464.97 frames. ], batch size: 88, lr: 3.46e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:14:41,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564220.0, ans=0.1 2024-08-14 08:15:00,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2564320.0, ans=0.035 2024-08-14 08:15:09,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2564420.0, ans=0.1 2024-08-14 08:15:22,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.332e+01 2.576e+01 2.987e+01 4.902e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 08:15:27,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564520.0, ans=0.1 2024-08-14 08:15:28,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2564520.0, ans=0.0 2024-08-14 08:15:30,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564520.0, ans=0.1 2024-08-14 08:15:32,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2564620.0, ans=0.125 2024-08-14 08:15:41,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2564620.0, ans=0.0 2024-08-14 08:15:47,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10100, loss[loss=0.09753, beats_loss=0.01202, ecapa_loss=0.0001715, whisper_loss=0.0838, over 17897.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01068, ecapa_loss=0.0001585, whisper_loss=0.09162, over 3874491.27 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:15:50,836 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 08:15:52,184 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 08:16:21,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2564920.0, ans=0.0 2024-08-14 08:16:32,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2564920.0, ans=0.1 2024-08-14 08:16:38,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2565020.0, ans=0.125 2024-08-14 08:16:40,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-14 08:16:44,759 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 08:16:50,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-14 08:17:07,855 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 08:17:10,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2024-08-14 08:17:12,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10150, loss[loss=0.1113, beats_loss=0.01239, ecapa_loss=0.0001412, whisper_loss=0.09749, over 23016.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001579, whisper_loss=0.09097, over 3894590.65 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:17:22,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2565220.0, ans=0.0 2024-08-14 08:17:34,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2565320.0, ans=0.125 2024-08-14 08:17:47,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2024-08-14 08:17:48,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2565420.0, ans=0.125 2024-08-14 08:18:08,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.364e+01 2.632e+01 2.890e+01 4.484e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-14 08:18:25,360 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 22 from LS+wenet, 20 from Vox, 54 fro AS 2024-08-14 08:18:29,269 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 08:18:36,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10200, loss[loss=0.09901, beats_loss=0.01142, ecapa_loss=0.0001471, whisper_loss=0.08612, over 20708.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001572, whisper_loss=0.09078, over 3887370.60 frames. ], batch size: 82, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:18:44,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2565720.0, ans=0.125 2024-08-14 08:18:59,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2565820.0, ans=0.125 2024-08-14 08:19:11,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:16,011 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-14 08:19:27,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2565920.0, ans=0.125 2024-08-14 08:19:28,210 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 08:19:39,695 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 08:19:41,133 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 08:20:02,852 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 9 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 08:20:04,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10250, loss[loss=0.05653, beats_loss=0.015, ecapa_loss=0.0001624, whisper_loss=0.03991, over 16057.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001566, whisper_loss=0.09042, over 3893890.40 frames. ], batch size: 68, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:20:11,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2566220.0, ans=0.1 2024-08-14 08:20:14,743 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 08:20:21,000 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-14 08:20:25,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2566320.0, ans=0.1 2024-08-14 08:20:57,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.331e+01 2.528e+01 2.980e+01 4.721e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-14 08:20:59,539 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 34 from Vox, 20 fro AS 2024-08-14 08:21:26,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10300, loss[loss=0.08211, beats_loss=0.01184, ecapa_loss=0.0001267, whisper_loss=0.069, over 16577.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.09077, over 3912605.60 frames. ], batch size: 65, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:21:57,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2566820.0, ans=0.95 2024-08-14 08:21:59,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2566920.0, ans=0.125 2024-08-14 08:22:17,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2567020.0, ans=0.125 2024-08-14 08:22:26,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2567020.0, ans=0.0 2024-08-14 08:22:36,599 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 08:22:50,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2567220.0, ans=0.0 2024-08-14 08:22:50,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2567220.0, ans=0.125 2024-08-14 08:22:51,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10350, loss[loss=0.1211, beats_loss=0.008942, ecapa_loss=0.000148, whisper_loss=0.1107, over 15391.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001562, whisper_loss=0.09133, over 3931771.70 frames. ], batch size: 58, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:23:11,281 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 08:23:20,897 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 08:23:45,842 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 08:23:51,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2567520.0, ans=0.125 2024-08-14 08:23:51,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.354e+01 2.584e+01 2.935e+01 4.636e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 08:24:22,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10400, loss[loss=0.1235, beats_loss=0.01132, ecapa_loss=0.000139, whisper_loss=0.1107, over 14676.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001569, whisper_loss=0.09063, over 3886875.70 frames. ], batch size: 56, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:24:32,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2567720.0, ans=0.1 2024-08-14 08:24:49,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-14 08:24:53,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2567820.0, ans=0.125 2024-08-14 08:24:56,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2567820.0, ans=0.125 2024-08-14 08:25:08,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2567920.0, ans=0.035 2024-08-14 08:25:24,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2568020.0, ans=0.0 2024-08-14 08:25:28,931 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 38 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 08:25:32,577 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 08:25:49,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10450, loss[loss=0.111, beats_loss=0.009092, ecapa_loss=0.0001673, whisper_loss=0.1003, over 22124.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01071, ecapa_loss=0.0001576, whisper_loss=0.09076, over 3873308.64 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:25:55,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2568220.0, ans=0.1 2024-08-14 08:26:02,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2568220.0, ans=0.2 2024-08-14 08:26:06,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-14 08:26:12,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2568320.0, ans=0.0 2024-08-14 08:26:12,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2568320.0, ans=0.1 2024-08-14 08:26:22,388 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-14 08:26:31,799 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 9 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-14 08:26:41,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.325e+01 2.620e+01 2.982e+01 4.539e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-14 08:27:02,807 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 08:27:06,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2568620.0, ans=0.0 2024-08-14 08:27:08,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10500, loss[loss=0.09645, beats_loss=0.006578, ecapa_loss=0.0002322, whisper_loss=0.08755, over 15281.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001593, whisper_loss=0.09032, over 3850996.38 frames. ], batch size: 60, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:27:14,011 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 08:27:17,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2568720.0, ans=0.0 2024-08-14 08:27:34,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2568820.0, ans=0.125 2024-08-14 08:27:35,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2568820.0, ans=0.125 2024-08-14 08:27:48,079 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 08:27:53,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2568920.0, ans=0.0 2024-08-14 08:27:53,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2568920.0, ans=0.125 2024-08-14 08:28:10,231 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 08:28:33,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2569120.0, ans=0.0 2024-08-14 08:28:41,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10550, loss[loss=0.1161, beats_loss=0.009502, ecapa_loss=0.0001769, whisper_loss=0.1049, over 14717.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001592, whisper_loss=0.08985, over 3828439.05 frames. ], batch size: 61, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:28:41,675 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 08:28:43,031 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 08:29:14,498 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 08:29:17,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2569420.0, ans=0.125 2024-08-14 08:29:23,771 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-14 08:29:26,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2569420.0, ans=0.125 2024-08-14 08:29:31,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2569420.0, ans=0.125 2024-08-14 08:29:41,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.277e+01 2.540e+01 2.895e+01 1.094e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-14 08:29:45,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2569520.0, ans=0.2 2024-08-14 08:29:51,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2024-08-14 08:30:06,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10600, loss[loss=0.1037, beats_loss=0.01128, ecapa_loss=0.0001418, whisper_loss=0.09099, over 22730.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0001568, whisper_loss=0.08961, over 3838528.18 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:30:09,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2569720.0, ans=0.125 2024-08-14 08:30:41,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2569920.0, ans=0.125 2024-08-14 08:30:47,556 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 08:30:57,594 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 08:31:05,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2570020.0, ans=0.0 2024-08-14 08:31:12,857 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 08:31:21,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2570120.0, ans=0.0 2024-08-14 08:31:24,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10650, loss[loss=0.1218, beats_loss=0.01181, ecapa_loss=0.0001238, whisper_loss=0.1087, over 18722.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01084, ecapa_loss=0.0001565, whisper_loss=0.09046, over 3856686.94 frames. ], batch size: 72, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:31:30,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-14 08:31:31,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2570220.0, ans=0.125 2024-08-14 08:31:41,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-14 08:31:45,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2570320.0, ans=0.0 2024-08-14 08:32:19,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2024-08-14 08:32:21,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.360e+01 2.616e+01 3.033e+01 9.241e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-14 08:32:25,216 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 08:32:38,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2024-08-14 08:32:52,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10700, loss[loss=0.1067, beats_loss=0.01199, ecapa_loss=0.0001366, whisper_loss=0.09332, over 23139.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.0001554, whisper_loss=0.09034, over 3855483.82 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:32:57,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2570720.0, ans=0.125 2024-08-14 08:32:59,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2570720.0, ans=0.0 2024-08-14 08:33:04,937 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 32 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-14 08:33:12,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2570820.0, ans=0.2 2024-08-14 08:33:14,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2570820.0, ans=0.125 2024-08-14 08:33:25,171 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-14 08:33:26,828 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-14 08:33:39,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2570920.0, ans=0.2 2024-08-14 08:33:44,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2570920.0, ans=0.1 2024-08-14 08:33:49,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2571020.0, ans=0.2 2024-08-14 08:34:04,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2571120.0, ans=0.125 2024-08-14 08:34:06,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-08-14 08:34:07,428 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 08:34:20,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2571220.0, ans=0.0 2024-08-14 08:34:21,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10750, loss[loss=0.111, beats_loss=0.01301, ecapa_loss=0.0001573, whisper_loss=0.09639, over 20858.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01079, ecapa_loss=0.000157, whisper_loss=0.09101, over 3872423.13 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:34:28,531 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 27 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-14 08:34:37,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2571320.0, ans=0.125 2024-08-14 08:34:45,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2571320.0, ans=0.2 2024-08-14 08:34:46,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2571320.0, ans=0.125 2024-08-14 08:34:48,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2571320.0, ans=0.0 2024-08-14 08:35:06,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2571520.0, ans=0.125 2024-08-14 08:35:13,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.456e+01 2.714e+01 3.010e+01 3.209e+02, threshold=5.428e+01, percent-clipped=1.0 2024-08-14 08:35:34,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-14 08:35:38,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10800, loss[loss=0.1339, beats_loss=0.007789, ecapa_loss=0.0001822, whisper_loss=0.1243, over 23456.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001567, whisper_loss=0.09156, over 3881671.61 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:35:45,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2571720.0, ans=0.0 2024-08-14 08:35:49,233 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 08:35:51,927 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-14 08:36:09,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2571920.0, ans=0.1 2024-08-14 08:36:14,528 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 08:36:34,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=12.0 2024-08-14 08:36:38,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-14 08:36:51,461 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 08:36:52,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-08-14 08:36:52,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10850, loss[loss=0.1085, beats_loss=0.01175, ecapa_loss=0.0001785, whisper_loss=0.09502, over 14731.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0108, ecapa_loss=0.0001566, whisper_loss=0.09184, over 3883582.21 frames. ], batch size: 61, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:37:13,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2572320.0, ans=0.125 2024-08-14 08:37:30,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2572420.0, ans=0.2 2024-08-14 08:37:35,155 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 08:37:35,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2572420.0, ans=0.1 2024-08-14 08:37:45,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.429e+01 2.769e+01 3.241e+01 1.860e+02, threshold=5.537e+01, percent-clipped=2.0 2024-08-14 08:37:50,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2572520.0, ans=0.125 2024-08-14 08:38:02,860 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 08:38:16,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2572720.0, ans=0.0 2024-08-14 08:38:17,624 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10900, loss[loss=0.09212, beats_loss=0.01186, ecapa_loss=0.000166, whisper_loss=0.07861, over 20580.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0107, ecapa_loss=0.0001557, whisper_loss=0.09211, over 3893367.50 frames. ], batch size: 86, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:38:29,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2572720.0, ans=0.125 2024-08-14 08:38:31,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2572720.0, ans=0.125 2024-08-14 08:38:34,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2572820.0, ans=0.125 2024-08-14 08:38:55,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2572920.0, ans=0.2 2024-08-14 08:39:05,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2572920.0, ans=0.125 2024-08-14 08:39:20,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2573020.0, ans=0.0 2024-08-14 08:39:41,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2573120.0, ans=0.125 2024-08-14 08:39:47,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 10950, loss[loss=0.1176, beats_loss=0.009272, ecapa_loss=0.0002087, whisper_loss=0.1063, over 19535.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.09301, over 3923829.31 frames. ], batch size: 78, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:39:50,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2573220.0, ans=0.125 2024-08-14 08:39:50,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2573220.0, ans=0.125 2024-08-14 08:39:52,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.94 vs. limit=10.0 2024-08-14 08:39:53,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2573220.0, ans=0.0 2024-08-14 08:40:10,012 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 08:40:10,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2573320.0, ans=0.0 2024-08-14 08:40:13,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2573320.0, ans=0.1 2024-08-14 08:40:14,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2573320.0, ans=0.2 2024-08-14 08:40:14,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2573320.0, ans=0.0 2024-08-14 08:40:15,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2024-08-14 08:40:17,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2573420.0, ans=0.1 2024-08-14 08:40:20,381 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 08:40:29,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2573420.0, ans=0.1 2024-08-14 08:40:30,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2573520.0, ans=0.09899494936611666 2024-08-14 08:40:35,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2573520.0, ans=0.125 2024-08-14 08:40:37,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.432e+01 2.668e+01 2.934e+01 4.215e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 08:40:46,459 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 08:40:52,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2573620.0, ans=0.125 2024-08-14 08:41:05,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11000, loss[loss=0.1086, beats_loss=0.007829, ecapa_loss=0.0001643, whisper_loss=0.0991, over 22528.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01064, ecapa_loss=0.0001567, whisper_loss=0.09241, over 3897194.26 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:41:20,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-08-14 08:41:20,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2573720.0, ans=0.2 2024-08-14 08:41:22,584 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 08:42:38,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-14 08:42:38,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11050, loss[loss=0.09917, beats_loss=0.01115, ecapa_loss=0.0001377, whisper_loss=0.08664, over 22824.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01065, ecapa_loss=0.000157, whisper_loss=0.09206, over 3925572.42 frames. ], batch size: 92, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:42:47,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2574220.0, ans=0.125 2024-08-14 08:42:49,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-14 08:43:15,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2574420.0, ans=10.0 2024-08-14 08:43:23,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2574420.0, ans=0.0 2024-08-14 08:43:35,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-14 08:43:41,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.313e+01 2.557e+01 2.807e+01 4.067e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-14 08:43:51,025 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 08:44:15,673 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 08:44:20,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11100, loss[loss=0.1045, beats_loss=0.01183, ecapa_loss=0.0001214, whisper_loss=0.09145, over 23078.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01062, ecapa_loss=0.0001563, whisper_loss=0.09278, over 3919557.89 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:44:24,496 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 08:44:50,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2574820.0, ans=0.2 2024-08-14 08:44:52,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2574820.0, ans=0.125 2024-08-14 08:45:10,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2574920.0, ans=0.2 2024-08-14 08:45:20,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2574920.0, ans=0.0 2024-08-14 08:46:03,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2575120.0, ans=0.0 2024-08-14 08:46:06,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2024-08-14 08:46:13,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11150, loss[loss=0.1159, beats_loss=0.009017, ecapa_loss=0.0001517, whisper_loss=0.1054, over 22246.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01058, ecapa_loss=0.0001561, whisper_loss=0.09295, over 3925358.92 frames. ], batch size: 86, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:46:59,078 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-14 08:47:10,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-14 08:47:29,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.286e+01 2.573e+01 3.032e+01 5.380e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-14 08:47:38,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2575520.0, ans=0.1 2024-08-14 08:47:48,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-14 08:48:02,125 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 08:48:11,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11200, loss[loss=0.08665, beats_loss=0.01102, ecapa_loss=0.0001868, whisper_loss=0.07376, over 20611.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01061, ecapa_loss=0.0001556, whisper_loss=0.09259, over 3919029.01 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:49:19,581 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 08:49:36,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11250, loss[loss=0.108, beats_loss=0.01055, ecapa_loss=0.0001658, whisper_loss=0.09581, over 21687.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01066, ecapa_loss=0.0001566, whisper_loss=0.09194, over 3912104.65 frames. ], batch size: 88, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:49:44,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2576220.0, ans=0.2 2024-08-14 08:49:58,992 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 08:50:00,691 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 08:50:12,389 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-14 08:50:13,607 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 08:50:25,076 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 08:50:26,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.439e+01 2.714e+01 2.976e+01 1.044e+02, threshold=5.429e+01, percent-clipped=2.0 2024-08-14 08:50:33,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2576520.0, ans=0.0 2024-08-14 08:50:38,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2576620.0, ans=0.0 2024-08-14 08:50:54,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11300, loss[loss=0.1061, beats_loss=0.008645, ecapa_loss=0.0001692, whisper_loss=0.09575, over 16534.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001567, whisper_loss=0.09153, over 3911194.30 frames. ], batch size: 64, lr: 3.45e-03, grad_scale: 1.152921504606847e+18 2024-08-14 08:51:19,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2576820.0, ans=0.125 2024-08-14 08:51:20,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2576820.0, ans=0.04949747468305833 2024-08-14 08:51:21,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-08-14 08:51:57,309 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-14 08:51:59,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-08-14 08:52:03,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2577120.0, ans=0.125 2024-08-14 08:52:16,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11350, loss[loss=0.124, beats_loss=0.009345, ecapa_loss=0.0001899, whisper_loss=0.1127, over 22858.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01064, ecapa_loss=0.000156, whisper_loss=0.09229, over 3933001.40 frames. ], batch size: 94, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:52:34,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2577320.0, ans=0.2 2024-08-14 08:52:45,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2024-08-14 08:52:56,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2577420.0, ans=0.0 2024-08-14 08:53:01,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-14 08:53:05,329 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 08:53:05,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2577520.0, ans=0.0 2024-08-14 08:53:14,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.319e+01 2.550e+01 2.857e+01 6.146e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 08:53:15,708 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-14 08:53:40,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11400, loss[loss=0.1038, beats_loss=0.01155, ecapa_loss=0.0001322, whisper_loss=0.09094, over 22855.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01074, ecapa_loss=0.0001558, whisper_loss=0.09134, over 3923568.95 frames. ], batch size: 91, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:53:42,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2577720.0, ans=0.0 2024-08-14 08:53:53,552 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 08:54:19,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2577920.0, ans=0.95 2024-08-14 08:54:19,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2577920.0, ans=0.125 2024-08-14 08:54:19,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2577920.0, ans=0.07 2024-08-14 08:54:22,750 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 08:54:23,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2577920.0, ans=0.04949747468305833 2024-08-14 08:54:35,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2578020.0, ans=0.0 2024-08-14 08:54:58,260 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11450, loss[loss=0.09634, beats_loss=0.01085, ecapa_loss=0.0001245, whisper_loss=0.08424, over 16816.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01071, ecapa_loss=0.0001566, whisper_loss=0.09169, over 3923144.44 frames. ], batch size: 62, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:55:01,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2578220.0, ans=0.125 2024-08-14 08:55:03,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2578220.0, ans=0.125 2024-08-14 08:55:08,489 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 08:55:25,343 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 08:55:25,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2578320.0, ans=0.125 2024-08-14 08:55:37,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2578420.0, ans=0.1 2024-08-14 08:55:48,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.444e+01 2.679e+01 2.887e+01 5.368e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-14 08:55:57,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2578620.0, ans=0.1 2024-08-14 08:56:07,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2578620.0, ans=0.0 2024-08-14 08:56:12,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2578720.0, ans=0.125 2024-08-14 08:56:13,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11500, loss[loss=0.09801, beats_loss=0.01124, ecapa_loss=0.0001556, whisper_loss=0.08522, over 21542.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001566, whisper_loss=0.09203, over 3894713.09 frames. ], batch size: 89, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:56:14,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2578720.0, ans=15.0 2024-08-14 08:56:17,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-14 08:56:35,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2578820.0, ans=0.0 2024-08-14 08:56:38,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-14 08:57:11,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-08-14 08:57:14,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2579120.0, ans=0.035 2024-08-14 08:57:28,738 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11550, loss[loss=0.1077, beats_loss=0.01102, ecapa_loss=0.0001468, whisper_loss=0.09525, over 21948.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09187, over 3874042.92 frames. ], batch size: 90, lr: 3.45e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:57:38,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2579220.0, ans=0.125 2024-08-14 08:57:39,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2579220.0, ans=0.1 2024-08-14 08:57:46,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2579320.0, ans=15.0 2024-08-14 08:58:01,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-14 08:58:14,503 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 08:58:18,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.371e+01 2.639e+01 3.011e+01 4.840e+01, threshold=5.278e+01, percent-clipped=0.0 2024-08-14 08:58:20,523 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 08:58:20,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2579520.0, ans=0.125 2024-08-14 08:58:21,779 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 08:58:34,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2579620.0, ans=0.0 2024-08-14 08:58:36,007 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 08:58:41,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11600, loss[loss=0.1055, beats_loss=0.008103, ecapa_loss=0.0001953, whisper_loss=0.09548, over 22031.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001581, whisper_loss=0.09184, over 3898115.41 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:58:49,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2579720.0, ans=0.2 2024-08-14 08:58:56,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2579820.0, ans=0.0 2024-08-14 08:59:22,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2579920.0, ans=0.1 2024-08-14 08:59:23,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-14 08:59:29,221 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 08:59:31,831 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 08:59:48,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2580120.0, ans=0.125 2024-08-14 08:59:51,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2580120.0, ans=0.2 2024-08-14 08:59:52,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-08-14 08:59:53,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11650, loss[loss=0.1378, beats_loss=0.00822, ecapa_loss=0.0001563, whisper_loss=0.128, over 16620.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01066, ecapa_loss=0.0001577, whisper_loss=0.09157, over 3883196.85 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 08:59:58,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-14 08:59:59,100 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 08:59:59,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2580220.0, ans=0.2 2024-08-14 09:00:03,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-14 09:00:07,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2580320.0, ans=0.015 2024-08-14 09:00:15,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2580320.0, ans=0.125 2024-08-14 09:00:17,723 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 09:00:41,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.343e+01 2.621e+01 2.855e+01 6.176e+01, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 09:00:42,044 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 09:00:54,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2580620.0, ans=0.125 2024-08-14 09:01:06,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11700, loss[loss=0.1, beats_loss=0.01015, ecapa_loss=0.0002163, whisper_loss=0.08772, over 20638.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.000158, whisper_loss=0.09183, over 3897834.85 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:01:23,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-14 09:01:29,229 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:01:36,020 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 09:01:49,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2581020.0, ans=0.025 2024-08-14 09:02:03,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-08-14 09:02:12,939 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 09:02:13,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2024-08-14 09:02:18,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11750, loss[loss=0.108, beats_loss=0.01056, ecapa_loss=0.0001413, whisper_loss=0.09599, over 20865.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001568, whisper_loss=0.09141, over 3911374.99 frames. ], batch size: 81, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:02:34,447 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 09:02:50,523 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:02:52,777 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 09:02:55,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2581420.0, ans=0.125 2024-08-14 09:03:03,168 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-14 09:03:07,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.345e+01 2.656e+01 2.861e+01 8.705e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-14 09:03:16,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2581620.0, ans=0.0 2024-08-14 09:03:20,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2581620.0, ans=0.0 2024-08-14 09:03:30,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11800, loss[loss=0.09575, beats_loss=0.01147, ecapa_loss=0.000169, whisper_loss=0.08259, over 21281.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01083, ecapa_loss=0.0001564, whisper_loss=0.09116, over 3900926.59 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:03:30,272 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-14 09:03:38,759 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 09:03:39,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2581720.0, ans=0.125 2024-08-14 09:03:47,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2581820.0, ans=0.125 2024-08-14 09:03:50,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2581820.0, ans=0.125 2024-08-14 09:04:12,250 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 09:04:12,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2581920.0, ans=0.125 2024-08-14 09:04:28,733 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 09:04:34,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2582020.0, ans=0.125 2024-08-14 09:04:46,892 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 09:04:58,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11850, loss[loss=0.1141, beats_loss=0.008591, ecapa_loss=0.0001775, whisper_loss=0.1037, over 16731.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001566, whisper_loss=0.09233, over 3892705.92 frames. ], batch size: 64, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:05:11,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2582220.0, ans=0.125 2024-08-14 09:05:16,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2582320.0, ans=0.0 2024-08-14 09:05:20,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-08-14 09:05:23,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2582320.0, ans=0.0 2024-08-14 09:05:25,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2582320.0, ans=0.95 2024-08-14 09:05:34,471 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 09:05:38,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2582420.0, ans=0.1 2024-08-14 09:05:39,076 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 09:05:41,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2582420.0, ans=0.1 2024-08-14 09:05:55,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2582520.0, ans=0.0 2024-08-14 09:06:01,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.374e+01 2.631e+01 2.932e+01 6.705e+01, threshold=5.263e+01, percent-clipped=1.0 2024-08-14 09:06:14,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2582620.0, ans=0.125 2024-08-14 09:06:24,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2582620.0, ans=0.0 2024-08-14 09:06:30,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11900, loss[loss=0.1072, beats_loss=0.01254, ecapa_loss=0.0001338, whisper_loss=0.09334, over 22030.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01082, ecapa_loss=0.000156, whisper_loss=0.09145, over 3891567.63 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:06:35,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2582720.0, ans=0.2 2024-08-14 09:06:50,664 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 09:06:50,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2582820.0, ans=0.0 2024-08-14 09:07:15,231 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-14 09:07:16,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2582920.0, ans=0.0 2024-08-14 09:07:36,433 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 09:07:55,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2583120.0, ans=0.05 2024-08-14 09:08:01,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 11950, loss[loss=0.1102, beats_loss=0.009393, ecapa_loss=0.0001802, whisper_loss=0.09901, over 20906.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.09147, over 3859283.42 frames. ], batch size: 87, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:08:02,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-14 09:08:09,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2583220.0, ans=0.125 2024-08-14 09:08:18,184 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 09:08:21,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2583320.0, ans=0.125 2024-08-14 09:08:39,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2583420.0, ans=0.125 2024-08-14 09:08:41,552 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-14 09:08:44,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2583420.0, ans=0.1 2024-08-14 09:08:45,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2583420.0, ans=0.125 2024-08-14 09:08:55,748 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.350e+01 2.661e+01 2.995e+01 4.363e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-14 09:08:59,493 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-14 09:09:14,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2583620.0, ans=0.125 2024-08-14 09:09:20,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2024-08-14 09:09:22,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2583720.0, ans=0.0 2024-08-14 09:09:22,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12000, loss[loss=0.08654, beats_loss=0.007577, ecapa_loss=0.0001762, whisper_loss=0.0772, over 13375.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001572, whisper_loss=0.09092, over 3832317.54 frames. ], batch size: 54, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:09:22,783 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 09:10:01,044 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005459, whisper_loss=0.2479, over 922467.00 frames. 2024-08-14 09:10:19,215 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on SV_voxceleb1: loss=0.004372, beats_loss=0, ecapa_loss=0.0004372, whisper_loss=0, over 939242.00 frames. 2024-08-14 09:12:09,317 INFO [train_multi_KD3.py:1149] (3/4) Epoch 18, validation on AT_audioset: loss=0.02349, beats_loss=0.02349, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 09:12:09,327 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 09:12:14,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2583720.0, ans=0.125 2024-08-14 09:12:15,689 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 09:12:18,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-14 09:12:18,860 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-14 09:12:26,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2583820.0, ans=0.07 2024-08-14 09:12:26,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2024-08-14 09:12:32,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2583820.0, ans=0.0 2024-08-14 09:13:00,239 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-14 09:13:12,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2584120.0, ans=0.125 2024-08-14 09:13:24,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2584120.0, ans=0.2 2024-08-14 09:13:27,592 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12050, loss[loss=0.1047, beats_loss=0.01042, ecapa_loss=0.0001622, whisper_loss=0.09269, over 16535.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001576, whisper_loss=0.09075, over 3806406.05 frames. ], batch size: 68, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:13:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2584320.0, ans=0.125 2024-08-14 09:14:01,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2584420.0, ans=0.0 2024-08-14 09:14:01,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2584420.0, ans=0.125 2024-08-14 09:14:07,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2584420.0, ans=0.0 2024-08-14 09:14:11,926 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-14 09:14:13,265 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 09:14:16,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-14 09:14:20,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.316e+01 2.502e+01 2.940e+01 4.119e+01, threshold=5.004e+01, percent-clipped=0.0 2024-08-14 09:14:23,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-14 09:14:35,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2584620.0, ans=0.125 2024-08-14 09:14:44,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12100, loss[loss=0.08867, beats_loss=0.01255, ecapa_loss=0.0001416, whisper_loss=0.0747, over 21251.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001579, whisper_loss=0.09106, over 3848630.83 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:14:47,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-14 09:15:07,324 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 09:15:28,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2585020.0, ans=0.5 2024-08-14 09:16:00,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12150, loss[loss=0.1135, beats_loss=0.009422, ecapa_loss=0.000145, whisper_loss=0.1026, over 22813.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.000157, whisper_loss=0.09051, over 3847313.48 frames. ], batch size: 91, lr: 3.44e-03, grad_scale: 5.764607523034235e+17 2024-08-14 09:16:01,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-14 09:16:04,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2585220.0, ans=0.125 2024-08-14 09:16:31,325 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.007e-02 2024-08-14 09:16:50,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2585520.0, ans=0.0 2024-08-14 09:16:50,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.391e+01 2.592e+01 3.075e+01 2.876e+02, threshold=5.185e+01, percent-clipped=6.0 2024-08-14 09:16:56,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2585520.0, ans=0.2 2024-08-14 09:17:02,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2585620.0, ans=0.125 2024-08-14 09:17:15,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12200, loss[loss=0.1073, beats_loss=0.01079, ecapa_loss=0.0001684, whisper_loss=0.09486, over 20648.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0108, ecapa_loss=0.0001576, whisper_loss=0.08992, over 3825577.93 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:17:15,507 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 09:17:41,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2585820.0, ans=0.0 2024-08-14 09:18:03,253 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 09:18:23,361 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06917443126440048, model_norm_threshold=51.84561538696289 2024-08-14 09:18:23,574 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.256e+05, grad_sumsq=1.256e+05, orig_rms_sq=1.000e+00 2024-08-14 09:18:27,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-08-14 09:18:29,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12250, loss[loss=0.1154, beats_loss=0.01078, ecapa_loss=0.0001476, whisper_loss=0.1031, over 22660.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001573, whisper_loss=0.09083, over 3841590.53 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:18:38,529 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 09:18:53,365 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 09:19:14,058 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-14 09:19:14,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2586520.0, ans=0.125 2024-08-14 09:19:14,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2586520.0, ans=0.125 2024-08-14 09:19:23,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.394e+01 2.719e+01 3.099e+01 7.495e+02, threshold=5.439e+01, percent-clipped=1.0 2024-08-14 09:19:25,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2024-08-14 09:19:30,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=12.0 2024-08-14 09:19:32,246 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 09:19:33,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2586620.0, ans=0.0 2024-08-14 09:19:36,736 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 09:19:42,743 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 09:19:47,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12300, loss[loss=0.09998, beats_loss=0.01133, ecapa_loss=0.0001267, whisper_loss=0.08739, over 16766.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001581, whisper_loss=0.09048, over 3846550.20 frames. ], batch size: 65, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:19:50,279 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 09:20:40,357 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 09:21:05,773 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 09:21:08,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2587120.0, ans=0.125 2024-08-14 09:21:10,057 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:21:22,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12350, loss[loss=0.09212, beats_loss=0.01154, ecapa_loss=0.0002253, whisper_loss=0.07832, over 18640.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.000158, whisper_loss=0.09015, over 3838962.02 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:21:49,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2587320.0, ans=0.0 2024-08-14 09:21:54,676 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-14 09:22:15,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=12.0 2024-08-14 09:22:20,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2587520.0, ans=0.125 2024-08-14 09:22:24,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.397e+01 2.618e+01 2.960e+01 3.782e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-14 09:22:35,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2587620.0, ans=0.0 2024-08-14 09:22:47,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12400, loss[loss=0.1052, beats_loss=0.00999, ecapa_loss=0.0001842, whisper_loss=0.09337, over 21689.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001569, whisper_loss=0.0909, over 3863226.40 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:22:54,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2587720.0, ans=0.125 2024-08-14 09:23:10,550 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 09:23:24,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2587920.0, ans=0.125 2024-08-14 09:23:48,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2588120.0, ans=0.2 2024-08-14 09:23:59,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2588120.0, ans=0.0 2024-08-14 09:24:02,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12450, loss[loss=0.09064, beats_loss=0.01137, ecapa_loss=0.0001636, whisper_loss=0.07763, over 16850.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001559, whisper_loss=0.09152, over 3869565.57 frames. ], batch size: 68, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:24:04,593 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 09:24:31,449 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 09:24:41,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2588420.0, ans=0.125 2024-08-14 09:24:54,069 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 09:24:55,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.420e+01 2.657e+01 3.140e+01 9.625e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-14 09:25:03,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2588620.0, ans=0.125 2024-08-14 09:25:14,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2588620.0, ans=0.125 2024-08-14 09:25:18,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12500, loss[loss=0.1139, beats_loss=0.01003, ecapa_loss=0.0001705, whisper_loss=0.1022, over 16446.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01055, ecapa_loss=0.0001569, whisper_loss=0.09233, over 3862704.32 frames. ], batch size: 66, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:25:22,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2588720.0, ans=0.125 2024-08-14 09:25:27,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2588720.0, ans=0.1 2024-08-14 09:25:41,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2588820.0, ans=0.0 2024-08-14 09:25:56,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-08-14 09:26:07,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2589020.0, ans=0.0 2024-08-14 09:26:09,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2589020.0, ans=0.125 2024-08-14 09:26:11,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2589020.0, ans=0.1 2024-08-14 09:26:11,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2589020.0, ans=0.1 2024-08-14 09:26:19,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2589120.0, ans=0.1 2024-08-14 09:26:33,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2589120.0, ans=0.2 2024-08-14 09:26:35,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12550, loss[loss=0.1178, beats_loss=0.009752, ecapa_loss=0.0001218, whisper_loss=0.1068, over 19417.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01052, ecapa_loss=0.0001572, whisper_loss=0.09281, over 3874759.13 frames. ], batch size: 73, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:26:37,157 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 09:26:39,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2589220.0, ans=0.0 2024-08-14 09:26:51,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2589320.0, ans=0.125 2024-08-14 09:27:20,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-08-14 09:27:22,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2589520.0, ans=0.2 2024-08-14 09:27:28,067 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 14 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 09:27:29,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.466e+01 2.734e+01 3.063e+01 5.302e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-14 09:27:48,210 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 09:27:54,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12600, loss[loss=0.1145, beats_loss=0.00987, ecapa_loss=0.000124, whisper_loss=0.1034, over 16237.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01067, ecapa_loss=0.0001568, whisper_loss=0.09218, over 3882935.77 frames. ], batch size: 61, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:28:14,122 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 15 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 09:28:40,414 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 09:28:41,873 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 09:28:53,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2590020.0, ans=0.035 2024-08-14 09:29:09,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2590120.0, ans=0.125 2024-08-14 09:29:29,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12650, loss[loss=0.08949, beats_loss=0.013, ecapa_loss=0.0001709, whisper_loss=0.07478, over 21884.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001568, whisper_loss=0.09146, over 3876873.47 frames. ], batch size: 93, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:29:31,059 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 09:30:09,917 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 09:30:10,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2590420.0, ans=0.125 2024-08-14 09:30:12,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2590420.0, ans=0.125 2024-08-14 09:30:28,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2590520.0, ans=0.125 2024-08-14 09:30:29,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2590520.0, ans=0.125 2024-08-14 09:30:37,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.252e+01 2.572e+01 2.889e+01 4.246e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-14 09:31:01,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12700, loss[loss=0.114, beats_loss=0.01018, ecapa_loss=0.0001729, whisper_loss=0.1021, over 22709.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01076, ecapa_loss=0.0001568, whisper_loss=0.09143, over 3869223.53 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:31:02,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2590720.0, ans=0.125 2024-08-14 09:31:05,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2590720.0, ans=0.125 2024-08-14 09:31:11,942 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-14 09:31:30,131 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:31:40,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2590920.0, ans=0.0 2024-08-14 09:31:48,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-14 09:31:55,130 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-14 09:32:13,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2591120.0, ans=0.125 2024-08-14 09:32:15,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12750, loss[loss=0.123, beats_loss=0.007287, ecapa_loss=0.000148, whisper_loss=0.1142, over 16519.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001578, whisper_loss=0.09164, over 3863087.14 frames. ], batch size: 60, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:32:19,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-08-14 09:32:23,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2591220.0, ans=0.1 2024-08-14 09:32:30,662 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 09:32:35,363 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 09:32:44,617 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 09:32:59,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2591520.0, ans=0.0 2024-08-14 09:33:07,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.400e+01 2.605e+01 3.000e+01 2.756e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-14 09:33:08,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2591520.0, ans=0.1 2024-08-14 09:33:30,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12800, loss[loss=0.1035, beats_loss=0.01241, ecapa_loss=0.000132, whisper_loss=0.08973, over 22419.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01081, ecapa_loss=0.0001586, whisper_loss=0.09165, over 3851940.57 frames. ], batch size: 88, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:33:37,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2591720.0, ans=0.2 2024-08-14 09:33:42,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2591720.0, ans=0.125 2024-08-14 09:33:43,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2591820.0, ans=0.125 2024-08-14 09:34:11,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-08-14 09:34:18,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2592020.0, ans=0.0 2024-08-14 09:34:20,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-08-14 09:34:25,294 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 09:34:35,876 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-14 09:35:17,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12850, loss[loss=0.1079, beats_loss=0.01158, ecapa_loss=0.0001448, whisper_loss=0.09487, over 18241.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001598, whisper_loss=0.09113, over 3887127.94 frames. ], batch size: 72, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:35:40,300 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:36:08,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2592420.0, ans=0.125 2024-08-14 09:36:23,577 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.339e+01 2.541e+01 2.791e+01 1.384e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-14 09:36:33,201 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 09:36:36,105 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 09:36:45,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2592720.0, ans=0.125 2024-08-14 09:36:46,319 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12900, loss[loss=0.1391, beats_loss=0.008157, ecapa_loss=0.000143, whisper_loss=0.1296, over 17901.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001596, whisper_loss=0.09097, over 3885686.45 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:36:50,011 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 09:36:59,070 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 09:37:01,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2592720.0, ans=0.125 2024-08-14 09:37:12,233 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 09:37:24,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-14 09:37:30,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2592920.0, ans=0.0 2024-08-14 09:37:57,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-08-14 09:38:05,997 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 09:38:07,575 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 09:38:24,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2593120.0, ans=0.125 2024-08-14 09:38:37,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 12950, loss[loss=0.09224, beats_loss=0.01177, ecapa_loss=0.0001697, whisper_loss=0.07877, over 22362.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001597, whisper_loss=0.09148, over 3918253.42 frames. ], batch size: 94, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:38:45,890 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 09:38:50,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2593220.0, ans=0.2 2024-08-14 09:38:51,781 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 09:39:39,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2593520.0, ans=0.0 2024-08-14 09:39:50,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.412e+01 2.710e+01 3.075e+01 4.932e+01, threshold=5.420e+01, percent-clipped=0.0 2024-08-14 09:40:03,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2593620.0, ans=0.1 2024-08-14 09:40:12,897 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.630e-03 2024-08-14 09:40:27,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2593720.0, ans=0.0 2024-08-14 09:40:27,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13000, loss[loss=0.09494, beats_loss=0.01198, ecapa_loss=0.0001445, whisper_loss=0.08152, over 20652.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.000159, whisper_loss=0.09166, over 3924387.79 frames. ], batch size: 83, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:40:52,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2593820.0, ans=0.125 2024-08-14 09:41:57,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-14 09:42:17,368 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 09:42:22,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13050, loss[loss=0.1056, beats_loss=0.01026, ecapa_loss=0.0001745, whisper_loss=0.09364, over 22737.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001593, whisper_loss=0.0915, over 3928639.92 frames. ], batch size: 92, lr: 3.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:42:33,971 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.857e-02 2024-08-14 09:42:45,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2594320.0, ans=0.0 2024-08-14 09:42:46,714 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 09:42:47,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2594320.0, ans=0.1 2024-08-14 09:42:53,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.87 vs. limit=22.5 2024-08-14 09:42:55,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2594320.0, ans=0.125 2024-08-14 09:43:12,517 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 09:43:27,786 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-14 09:43:32,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.338e+01 2.607e+01 2.942e+01 4.688e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-14 09:43:41,333 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 09:43:47,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2594620.0, ans=0.0 2024-08-14 09:44:03,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13100, loss[loss=0.09514, beats_loss=0.01208, ecapa_loss=0.000162, whisper_loss=0.08144, over 14617.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001585, whisper_loss=0.09127, over 3891680.61 frames. ], batch size: 61, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:44:11,749 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 09:44:13,711 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 09:44:15,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2594720.0, ans=15.0 2024-08-14 09:44:36,426 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 09:44:40,314 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 09:44:42,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2594820.0, ans=0.125 2024-08-14 09:44:59,090 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 09:45:09,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2595020.0, ans=0.125 2024-08-14 09:45:21,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2595120.0, ans=0.0 2024-08-14 09:45:29,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13150, loss[loss=0.1138, beats_loss=0.01046, ecapa_loss=0.0001379, whisper_loss=0.102, over 19292.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001585, whisper_loss=0.09118, over 3886144.58 frames. ], batch size: 73, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:45:40,935 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 09:45:41,341 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.504e+05 2024-08-14 09:45:47,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2595320.0, ans=0.125 2024-08-14 09:45:52,782 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.307e-01 2024-08-14 09:45:58,859 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:46:05,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2595420.0, ans=0.125 2024-08-14 09:46:07,117 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 09:46:16,357 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 09:46:17,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2595520.0, ans=0.125 2024-08-14 09:46:20,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.308e+01 2.613e+01 2.975e+01 4.681e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-14 09:46:20,515 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 09:46:23,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2595520.0, ans=0.125 2024-08-14 09:46:37,687 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 10 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 09:46:42,156 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13200, loss[loss=0.1066, beats_loss=0.01198, ecapa_loss=0.0001346, whisper_loss=0.09329, over 22827.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001574, whisper_loss=0.09125, over 3879109.21 frames. ], batch size: 92, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:46:42,350 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 09:46:44,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2595720.0, ans=0.125 2024-08-14 09:46:44,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2595720.0, ans=0.125 2024-08-14 09:46:53,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.80 vs. limit=22.5 2024-08-14 09:47:18,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2595920.0, ans=0.125 2024-08-14 09:47:24,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2595920.0, ans=0.2 2024-08-14 09:47:27,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2595920.0, ans=0.1 2024-08-14 09:48:01,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-14 09:48:16,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13250, loss[loss=0.1052, beats_loss=0.01007, ecapa_loss=0.0001589, whisper_loss=0.09358, over 18363.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01057, ecapa_loss=0.0001576, whisper_loss=0.09261, over 3901613.90 frames. ], batch size: 73, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:48:19,490 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 09:48:32,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2024-08-14 09:49:16,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2596520.0, ans=0.1 2024-08-14 09:49:21,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2024-08-14 09:49:26,596 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.423e+01 2.644e+01 3.025e+01 2.002e+02, threshold=5.289e+01, percent-clipped=3.0 2024-08-14 09:49:44,355 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-14 09:49:49,897 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 09:49:57,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13300, loss[loss=0.1122, beats_loss=0.008823, ecapa_loss=0.000196, whisper_loss=0.1014, over 20019.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01062, ecapa_loss=0.0001573, whisper_loss=0.09217, over 3914724.16 frames. ], batch size: 81, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:50:00,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2596720.0, ans=0.125 2024-08-14 09:50:10,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2596720.0, ans=0.035 2024-08-14 09:50:13,189 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-14 09:50:14,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-14 09:50:16,910 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 09:50:22,950 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 09:50:53,223 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 09:50:53,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2596920.0, ans=0.04949747468305833 2024-08-14 09:50:56,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2024-08-14 09:51:00,395 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 09:51:31,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13350, loss[loss=0.09299, beats_loss=0.01232, ecapa_loss=0.0001235, whisper_loss=0.07944, over 20331.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001554, whisper_loss=0.09114, over 3880546.45 frames. ], batch size: 77, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:51:33,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2597220.0, ans=0.0 2024-08-14 09:51:34,277 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 09:51:38,842 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-14 09:51:51,375 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 09:51:52,898 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 09:52:03,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2597420.0, ans=0.125 2024-08-14 09:52:06,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-08-14 09:52:23,790 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 09:52:25,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.409e+01 2.670e+01 3.063e+01 5.921e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-14 09:52:26,583 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 09:52:36,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2597620.0, ans=0.0 2024-08-14 09:52:47,924 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13400, loss[loss=0.1003, beats_loss=0.009508, ecapa_loss=0.000145, whisper_loss=0.0893, over 18323.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.0001551, whisper_loss=0.09073, over 3890485.38 frames. ], batch size: 70, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:52:55,955 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-14 09:53:07,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2597820.0, ans=0.0 2024-08-14 09:53:10,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2597820.0, ans=0.09899494936611666 2024-08-14 09:53:23,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2597920.0, ans=0.0 2024-08-14 09:53:27,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2597920.0, ans=0.2 2024-08-14 09:53:35,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2598020.0, ans=0.125 2024-08-14 09:53:38,541 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 09:53:43,731 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 09:53:54,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-14 09:54:03,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2024-08-14 09:54:05,720 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 09:54:06,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13450, loss[loss=0.08959, beats_loss=0.01201, ecapa_loss=0.0001525, whisper_loss=0.07605, over 18746.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001556, whisper_loss=0.09068, over 3905820.96 frames. ], batch size: 75, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:54:23,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2598320.0, ans=0.125 2024-08-14 09:54:24,422 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 09:54:34,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2598320.0, ans=0.125 2024-08-14 09:54:38,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-14 09:54:42,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2598420.0, ans=0.125 2024-08-14 09:54:55,071 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 09:54:59,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.340e+01 2.558e+01 2.955e+01 5.061e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-14 09:55:01,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2598520.0, ans=0.125 2024-08-14 09:55:20,858 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13500, loss[loss=0.1058, beats_loss=0.01031, ecapa_loss=0.0001413, whisper_loss=0.0941, over 22524.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001559, whisper_loss=0.09023, over 3868701.31 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:55:30,556 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 09:55:38,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-14 09:55:43,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2598820.0, ans=0.125 2024-08-14 09:55:46,556 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 09:55:54,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.63 vs. limit=10.0 2024-08-14 09:55:56,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2598920.0, ans=0.0 2024-08-14 09:56:02,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2598920.0, ans=0.125 2024-08-14 09:56:06,759 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 09:56:08,095 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 09:56:25,055 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 09:56:33,443 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13550, loss[loss=0.08053, beats_loss=0.01372, ecapa_loss=0.000136, whisper_loss=0.06545, over 20753.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01082, ecapa_loss=0.0001571, whisper_loss=0.09003, over 3874387.00 frames. ], batch size: 88, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:56:36,566 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-14 09:56:53,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=15.0 2024-08-14 09:57:22,691 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 09:57:24,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.303e+01 2.544e+01 2.977e+01 7.464e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-14 09:57:39,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-14 09:57:45,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13600, loss[loss=0.09029, beats_loss=0.009879, ecapa_loss=0.0002053, whisper_loss=0.07836, over 16014.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.0001557, whisper_loss=0.09039, over 3893622.29 frames. ], batch size: 66, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:57:48,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2599720.0, ans=0.02 2024-08-14 09:58:04,028 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:58:06,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2599820.0, ans=0.125 2024-08-14 09:58:07,951 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 09:58:11,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.64 vs. limit=10.0 2024-08-14 09:58:20,995 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 8 from Vox, 35 fro AS 2024-08-14 09:58:21,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-08-14 09:58:35,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2600020.0, ans=0.0 2024-08-14 09:59:01,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13650, loss[loss=0.1171, beats_loss=0.0126, ecapa_loss=0.0001355, whisper_loss=0.1032, over 19077.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001554, whisper_loss=0.09057, over 3877779.36 frames. ], batch size: 75, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 09:59:10,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2600220.0, ans=0.125 2024-08-14 09:59:26,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2600320.0, ans=0.0 2024-08-14 09:59:32,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2600420.0, ans=0.125 2024-08-14 09:59:37,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2600420.0, ans=0.125 2024-08-14 09:59:41,441 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 09:59:41,815 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 09:59:51,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.361e+01 2.642e+01 3.028e+01 5.099e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-14 09:59:59,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2600620.0, ans=0.0 2024-08-14 10:00:08,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2600620.0, ans=0.2 2024-08-14 10:00:13,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13700, loss[loss=0.09662, beats_loss=0.0135, ecapa_loss=0.0001078, whisper_loss=0.08205, over 17098.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001546, whisper_loss=0.09083, over 3894870.02 frames. ], batch size: 63, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:00:24,482 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:00:44,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2600920.0, ans=0.0 2024-08-14 10:00:55,735 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 10:01:14,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2601120.0, ans=0.025 2024-08-14 10:01:16,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2601120.0, ans=0.1 2024-08-14 10:01:26,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13750, loss[loss=0.1033, beats_loss=0.01147, ecapa_loss=0.0001421, whisper_loss=0.09041, over 20904.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01088, ecapa_loss=0.0001534, whisper_loss=0.09015, over 3890371.15 frames. ], batch size: 84, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:01:31,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2601220.0, ans=0.125 2024-08-14 10:02:09,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2601520.0, ans=0.125 2024-08-14 10:02:15,913 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 10:02:17,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.453e+01 2.720e+01 3.144e+01 4.957e+01, threshold=5.441e+01, percent-clipped=0.0 2024-08-14 10:02:17,525 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 10:02:34,386 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 10:02:39,707 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13800, loss[loss=0.1049, beats_loss=0.01216, ecapa_loss=0.0001428, whisper_loss=0.09135, over 15531.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01084, ecapa_loss=0.0001548, whisper_loss=0.08976, over 3905109.91 frames. ], batch size: 62, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:02:43,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2601720.0, ans=0.2 2024-08-14 10:02:55,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-14 10:02:57,037 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 10:03:01,361 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.662e-02 2024-08-14 10:03:46,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2602120.0, ans=0.1 2024-08-14 10:03:51,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13850, loss[loss=0.1015, beats_loss=0.009996, ecapa_loss=0.0001758, whisper_loss=0.08974, over 17263.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001553, whisper_loss=0.0906, over 3903408.74 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:04:00,343 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 10:04:06,128 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 10:04:17,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2602320.0, ans=0.0 2024-08-14 10:04:30,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2602420.0, ans=0.2 2024-08-14 10:04:40,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2602520.0, ans=0.0 2024-08-14 10:04:40,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.427e+01 2.695e+01 2.897e+01 4.823e+02, threshold=5.391e+01, percent-clipped=1.0 2024-08-14 10:05:02,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13900, loss[loss=0.08776, beats_loss=0.01187, ecapa_loss=0.0001518, whisper_loss=0.07438, over 22783.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01081, ecapa_loss=0.0001549, whisper_loss=0.09117, over 3950396.13 frames. ], batch size: 94, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:05:09,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-14 10:05:11,829 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 41 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 10:05:21,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2602820.0, ans=0.125 2024-08-14 10:05:25,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2602820.0, ans=0.125 2024-08-14 10:05:53,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2603020.0, ans=0.125 2024-08-14 10:06:08,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2603120.0, ans=0.125 2024-08-14 10:06:15,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 13950, loss[loss=0.1264, beats_loss=0.008932, ecapa_loss=0.0001858, whisper_loss=0.1157, over 14367.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01079, ecapa_loss=0.0001544, whisper_loss=0.09156, over 3930359.82 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:06:15,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2603220.0, ans=0.2 2024-08-14 10:06:20,869 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 10:06:31,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2603320.0, ans=0.125 2024-08-14 10:06:41,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.18 vs. limit=6.0 2024-08-14 10:06:46,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2603420.0, ans=0.125 2024-08-14 10:06:51,840 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 10:06:56,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2603520.0, ans=0.0 2024-08-14 10:07:04,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.381e+01 2.587e+01 2.937e+01 5.454e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-14 10:07:22,083 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 10:07:26,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14000, loss[loss=0.1024, beats_loss=0.01234, ecapa_loss=0.0001306, whisper_loss=0.08879, over 22760.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01085, ecapa_loss=0.0001538, whisper_loss=0.09126, over 3944536.53 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:07:28,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2024-08-14 10:07:32,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2024-08-14 10:07:40,014 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 10:07:57,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-08-14 10:08:06,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2603920.0, ans=0.0 2024-08-14 10:08:27,783 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 10:08:29,269 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 10:08:37,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-14 10:08:39,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14050, loss[loss=0.1053, beats_loss=0.008848, ecapa_loss=0.0001407, whisper_loss=0.09503, over 17676.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01075, ecapa_loss=0.0001541, whisper_loss=0.09174, over 3915316.29 frames. ], batch size: 67, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:08:57,885 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 10:09:03,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-08-14 10:09:05,090 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 10:09:17,108 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 10:09:28,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2604520.0, ans=0.0 2024-08-14 10:09:29,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.346e+01 2.567e+01 2.904e+01 5.000e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-14 10:09:33,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-14 10:09:38,558 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 10:09:50,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14100, loss[loss=0.1142, beats_loss=0.009014, ecapa_loss=0.0001634, whisper_loss=0.1035, over 23316.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001537, whisper_loss=0.09136, over 3898253.46 frames. ], batch size: 92, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:10:10,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2604820.0, ans=0.125 2024-08-14 10:10:11,408 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 10:11:00,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2605120.0, ans=0.035 2024-08-14 10:11:03,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14150, loss[loss=0.09884, beats_loss=0.01339, ecapa_loss=0.0001692, whisper_loss=0.08376, over 20508.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001539, whisper_loss=0.0918, over 3917277.88 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 10:11:19,321 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 10:11:46,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2605520.0, ans=0.95 2024-08-14 10:11:48,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2605520.0, ans=0.125 2024-08-14 10:11:48,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2605520.0, ans=0.125 2024-08-14 10:11:53,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.390e+01 2.553e+01 2.829e+01 7.364e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-14 10:12:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2605620.0, ans=0.0 2024-08-14 10:12:15,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14200, loss[loss=0.1219, beats_loss=0.008127, ecapa_loss=0.0001491, whisper_loss=0.1123, over 15089.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001539, whisper_loss=0.0916, over 3909505.55 frames. ], batch size: 54, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:12:27,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2605720.0, ans=0.2 2024-08-14 10:12:44,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-14 10:12:47,195 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 10:12:54,275 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 10:12:58,968 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 10:13:07,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2606020.0, ans=0.125 2024-08-14 10:13:27,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14250, loss[loss=0.1031, beats_loss=0.01222, ecapa_loss=0.0001456, whisper_loss=0.08945, over 17741.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001533, whisper_loss=0.09114, over 3932952.89 frames. ], batch size: 71, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:13:30,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2606220.0, ans=0.0 2024-08-14 10:13:33,031 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 10:13:57,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=22.5 2024-08-14 10:14:01,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2606420.0, ans=10.0 2024-08-14 10:14:03,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2606420.0, ans=0.125 2024-08-14 10:14:11,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2606520.0, ans=0.05 2024-08-14 10:14:14,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2606520.0, ans=0.125 2024-08-14 10:14:18,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.438e+01 2.661e+01 3.044e+01 6.273e+01, threshold=5.322e+01, percent-clipped=2.0 2024-08-14 10:14:19,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2606520.0, ans=0.0 2024-08-14 10:14:30,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2606620.0, ans=0.1 2024-08-14 10:14:39,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14300, loss[loss=0.09051, beats_loss=0.01214, ecapa_loss=0.0001836, whisper_loss=0.07653, over 20050.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001542, whisper_loss=0.09081, over 3881237.40 frames. ], batch size: 88, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:14:49,790 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 10:14:50,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2024-08-14 10:15:00,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-14 10:15:16,348 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:15:22,598 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 10:15:24,143 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-14 10:15:26,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2607020.0, ans=0.125 2024-08-14 10:15:28,844 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 10:15:31,915 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 10:15:41,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2607120.0, ans=0.125 2024-08-14 10:15:42,650 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 10:15:45,962 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 10:15:52,775 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 10:15:55,696 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14350, loss[loss=0.1073, beats_loss=0.009446, ecapa_loss=0.0001666, whisper_loss=0.09621, over 19993.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.0001542, whisper_loss=0.09084, over 3893850.43 frames. ], batch size: 80, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:16:19,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.25 vs. limit=15.0 2024-08-14 10:16:29,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2607420.0, ans=0.125 2024-08-14 10:16:30,534 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 10:16:35,868 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-14 10:16:56,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.361e+01 2.607e+01 2.997e+01 4.259e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-14 10:17:23,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14400, loss[loss=0.09251, beats_loss=0.01107, ecapa_loss=0.000159, whisper_loss=0.07985, over 14067.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.000155, whisper_loss=0.09063, over 3916956.26 frames. ], batch size: 56, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:17:38,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2607720.0, ans=0.05 2024-08-14 10:17:41,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2607820.0, ans=0.125 2024-08-14 10:17:53,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2607820.0, ans=0.0 2024-08-14 10:18:45,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 18, batch 14450, loss[loss=0.1144, beats_loss=0.01082, ecapa_loss=0.000166, whisper_loss=0.1019, over 22116.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001556, whisper_loss=0.09116, over 3903280.16 frames. ], batch size: 91, lr: 3.43e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:18:49,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2608220.0, ans=0.125 2024-08-14 10:18:57,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2608220.0, ans=0.125 2024-08-14 10:19:02,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2608320.0, ans=0.1 2024-08-14 10:19:03,777 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 10:19:22,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-14 10:19:22,733 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 10:19:55,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 0, loss[loss=0.08942, beats_loss=0.01038, ecapa_loss=0.0001566, whisper_loss=0.07747, over 18427.00 frames. ], tot_loss[loss=0.08942, beats_loss=0.01038, ecapa_loss=0.0001566, whisper_loss=0.07747, over 18427.00 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:19:55,444 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 10:20:37,914 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005486, whisper_loss=0.2484, over 922467.00 frames. 2024-08-14 10:20:53,978 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on SV_voxceleb1: loss=0.004382, beats_loss=0, ecapa_loss=0.0004382, whisper_loss=0, over 939242.00 frames. 2024-08-14 10:22:56,829 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 10:22:56,832 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 10:22:58,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2608520.0, ans=0.125 2024-08-14 10:23:05,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2608520.0, ans=0.125 2024-08-14 10:23:09,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.576e+01 3.022e+01 6.974e+01, threshold=5.152e+01, percent-clipped=1.0 2024-08-14 10:23:26,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2608620.0, ans=0.1 2024-08-14 10:23:38,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-14 10:23:40,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.14 vs. limit=15.0 2024-08-14 10:23:50,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2608720.0, ans=0.125 2024-08-14 10:23:51,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2608720.0, ans=0.125 2024-08-14 10:24:07,176 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 10:24:11,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-14 10:24:18,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2608820.0, ans=0.125 2024-08-14 10:24:23,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2608820.0, ans=0.0 2024-08-14 10:24:34,759 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 10:24:36,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2608820.0, ans=0.0 2024-08-14 10:24:43,523 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 10:24:49,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2608920.0, ans=0.125 2024-08-14 10:24:49,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2608920.0, ans=0.125 2024-08-14 10:25:06,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 50, loss[loss=0.08882, beats_loss=0.01098, ecapa_loss=0.0001145, whisper_loss=0.0767, over 19962.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009668, ecapa_loss=0.0001612, whisper_loss=0.09103, over 907483.52 frames. ], batch size: 79, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:25:12,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2609020.0, ans=0.0 2024-08-14 10:25:17,680 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 10:25:55,477 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 10:26:14,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2609220.0, ans=0.1 2024-08-14 10:26:33,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2609320.0, ans=0.125 2024-08-14 10:26:41,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2609420.0, ans=0.125 2024-08-14 10:26:59,045 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-14 10:27:04,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 100, loss[loss=0.1007, beats_loss=0.007758, ecapa_loss=0.0001938, whisper_loss=0.09103, over 23238.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.009584, ecapa_loss=0.0001611, whisper_loss=0.09159, over 1552958.43 frames. ], batch size: 93, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:27:10,951 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 10:27:16,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.614e+01 2.833e+01 3.144e+01 8.943e+01, threshold=5.666e+01, percent-clipped=3.0 2024-08-14 10:27:18,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2609520.0, ans=0.0 2024-08-14 10:27:20,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2609520.0, ans=0.125 2024-08-14 10:27:22,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2609520.0, ans=0.025 2024-08-14 10:27:52,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2609720.0, ans=0.125 2024-08-14 10:27:59,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2609720.0, ans=0.2 2024-08-14 10:28:10,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2609720.0, ans=0.125 2024-08-14 10:28:20,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-08-14 10:28:31,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2609820.0, ans=0.125 2024-08-14 10:28:56,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 150, loss[loss=0.1014, beats_loss=0.009046, ecapa_loss=0.0001744, whisper_loss=0.09057, over 19960.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.009525, ecapa_loss=0.0001608, whisper_loss=0.09174, over 2068814.06 frames. ], batch size: 80, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:29:00,773 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 10:29:08,117 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-14 10:29:12,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2610020.0, ans=0.125 2024-08-14 10:29:15,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2610120.0, ans=0.1 2024-08-14 10:29:24,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2610120.0, ans=0.125 2024-08-14 10:29:25,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2610120.0, ans=0.0 2024-08-14 10:29:30,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2610220.0, ans=0.125 2024-08-14 10:29:31,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2610220.0, ans=0.04949747468305833 2024-08-14 10:29:41,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2024-08-14 10:29:54,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2610320.0, ans=0.0 2024-08-14 10:30:01,814 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 10:30:13,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2610420.0, ans=0.0 2024-08-14 10:30:15,896 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 10:30:18,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 200, loss[loss=0.09824, beats_loss=0.01112, ecapa_loss=0.0001372, whisper_loss=0.08575, over 22180.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.00978, ecapa_loss=0.0001589, whisper_loss=0.09153, over 2456369.89 frames. ], batch size: 88, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:30:25,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.407e+01 2.774e+01 3.039e+01 4.574e+01, threshold=5.548e+01, percent-clipped=0.0 2024-08-14 10:30:29,002 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 10:30:38,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2610620.0, ans=0.125 2024-08-14 10:30:42,893 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-14 10:30:46,186 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 10:30:58,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-14 10:31:05,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.19 vs. limit=15.0 2024-08-14 10:31:08,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2024-08-14 10:31:15,117 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:31:22,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2610920.0, ans=0.0 2024-08-14 10:31:29,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2610920.0, ans=0.0 2024-08-14 10:31:37,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 250, loss[loss=0.09675, beats_loss=0.01071, ecapa_loss=0.00014, whisper_loss=0.08464, over 19256.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01002, ecapa_loss=0.0001574, whisper_loss=0.09156, over 2769049.52 frames. ], batch size: 73, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:32:29,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2611320.0, ans=0.1 2024-08-14 10:32:31,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2611320.0, ans=0.0 2024-08-14 10:32:45,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2611420.0, ans=0.125 2024-08-14 10:32:45,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-14 10:32:53,066 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 10:32:55,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2611420.0, ans=0.125 2024-08-14 10:33:01,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 300, loss[loss=0.09578, beats_loss=0.01087, ecapa_loss=0.0001302, whisper_loss=0.08361, over 20149.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01009, ecapa_loss=0.0001581, whisper_loss=0.09133, over 3013165.78 frames. ], batch size: 78, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:33:03,142 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 10:33:07,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2611520.0, ans=0.125 2024-08-14 10:33:10,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.366e+01 2.598e+01 2.945e+01 2.183e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-14 10:33:16,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2611520.0, ans=0.125 2024-08-14 10:33:21,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=2611620.0, ans=12.0 2024-08-14 10:33:47,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-14 10:34:04,560 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 10:34:12,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2611920.0, ans=0.2 2024-08-14 10:34:13,532 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 10:34:18,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2611920.0, ans=0.05 2024-08-14 10:34:23,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-08-14 10:34:29,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 350, loss[loss=0.1068, beats_loss=0.01067, ecapa_loss=0.0001441, whisper_loss=0.09466, over 17714.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01021, ecapa_loss=0.0001579, whisper_loss=0.09162, over 3208141.76 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:34:33,255 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 10:34:54,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2612120.0, ans=0.0 2024-08-14 10:35:11,715 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 10:35:22,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2612320.0, ans=0.0 2024-08-14 10:35:24,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2612320.0, ans=0.2 2024-08-14 10:35:53,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 400, loss[loss=0.09921, beats_loss=0.01121, ecapa_loss=0.0001556, whisper_loss=0.08644, over 21970.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001568, whisper_loss=0.09078, over 3358327.39 frames. ], batch size: 88, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:35:58,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2024-08-14 10:36:01,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.324e+01 2.549e+01 2.797e+01 3.225e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-14 10:36:16,673 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-14 10:36:31,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2612720.0, ans=0.125 2024-08-14 10:36:33,904 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 10:36:45,808 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 10:37:00,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2612920.0, ans=0.125 2024-08-14 10:37:05,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2612920.0, ans=0.07 2024-08-14 10:37:11,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2612920.0, ans=0.2 2024-08-14 10:37:17,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 450, loss[loss=0.05238, beats_loss=0.0129, ecapa_loss=0.0001236, whisper_loss=0.03825, over 17550.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001568, whisper_loss=0.09039, over 3488889.99 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:37:33,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2613020.0, ans=0.125 2024-08-14 10:37:36,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2613120.0, ans=0.0 2024-08-14 10:37:39,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2613120.0, ans=0.0 2024-08-14 10:37:47,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2613120.0, ans=0.2 2024-08-14 10:37:57,076 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 10:38:22,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.09 vs. limit=15.0 2024-08-14 10:38:27,939 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 10:38:31,597 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 10:38:37,808 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 10:38:43,588 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-14 10:38:47,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 500, loss[loss=0.1115, beats_loss=0.01033, ecapa_loss=0.0001663, whisper_loss=0.09947, over 20506.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001554, whisper_loss=0.08935, over 3555112.29 frames. ], batch size: 79, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:38:56,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.547e+01 2.928e+01 5.420e+01, threshold=5.093e+01, percent-clipped=1.0 2024-08-14 10:39:29,549 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 10:39:52,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2613820.0, ans=0.1 2024-08-14 10:39:54,767 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 10:40:00,544 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 10:40:03,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-08-14 10:40:12,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2613920.0, ans=0.1 2024-08-14 10:40:13,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2613920.0, ans=0.1 2024-08-14 10:40:15,146 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 10:40:18,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 550, loss[loss=0.09417, beats_loss=0.01079, ecapa_loss=0.0001548, whisper_loss=0.08183, over 14269.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001559, whisper_loss=0.09082, over 3598224.86 frames. ], batch size: 55, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:40:20,854 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 10:40:27,410 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 10:40:30,069 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 10:40:37,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2614120.0, ans=0.125 2024-08-14 10:40:41,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2614120.0, ans=0.2 2024-08-14 10:40:44,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2614120.0, ans=0.1 2024-08-14 10:40:57,801 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 10:41:00,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2614220.0, ans=0.0 2024-08-14 10:41:08,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2614220.0, ans=0.125 2024-08-14 10:41:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2614320.0, ans=0.0 2024-08-14 10:41:34,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2614420.0, ans=0.1 2024-08-14 10:41:34,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2614420.0, ans=0.1 2024-08-14 10:41:37,382 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 10:41:43,113 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 10:41:46,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 600, loss[loss=0.1033, beats_loss=0.01101, ecapa_loss=0.0001406, whisper_loss=0.0909, over 22104.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001536, whisper_loss=0.09118, over 3651895.06 frames. ], batch size: 88, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:41:53,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.288e+01 2.520e+01 2.805e+01 9.045e+01, threshold=5.041e+01, percent-clipped=2.0 2024-08-14 10:42:21,210 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-14 10:42:57,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2614920.0, ans=0.125 2024-08-14 10:42:59,169 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.035e-03 2024-08-14 10:43:02,706 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 10:43:03,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 650, loss[loss=0.08501, beats_loss=0.0113, ecapa_loss=0.0001772, whisper_loss=0.07194, over 16188.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001538, whisper_loss=0.09113, over 3658331.45 frames. ], batch size: 68, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:43:21,133 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 10:43:26,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2615120.0, ans=0.125 2024-08-14 10:43:40,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2615220.0, ans=0.125 2024-08-14 10:43:46,799 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 10:43:51,230 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 10:43:54,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2024-08-14 10:44:08,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2615420.0, ans=0.2 2024-08-14 10:44:13,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 700, loss[loss=0.1064, beats_loss=0.01242, ecapa_loss=0.0001634, whisper_loss=0.09235, over 22095.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001538, whisper_loss=0.09091, over 3720408.15 frames. ], batch size: 87, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:44:14,979 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 10:44:16,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2615520.0, ans=0.2 2024-08-14 10:44:19,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.455e+01 2.625e+01 2.898e+01 4.319e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-14 10:44:24,070 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 10:44:40,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2615720.0, ans=0.125 2024-08-14 10:45:03,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2615820.0, ans=0.125 2024-08-14 10:45:10,661 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-14 10:45:16,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2615920.0, ans=0.1 2024-08-14 10:45:20,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 750, loss[loss=0.1009, beats_loss=0.01219, ecapa_loss=0.0001379, whisper_loss=0.08731, over 22537.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001542, whisper_loss=0.09096, over 3742391.33 frames. ], batch size: 88, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:45:26,086 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 10:45:38,417 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 10:45:46,369 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-14 10:45:50,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2616220.0, ans=0.125 2024-08-14 10:45:51,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2616220.0, ans=0.0 2024-08-14 10:46:04,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2616320.0, ans=0.2 2024-08-14 10:46:22,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=12.0 2024-08-14 10:46:27,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 800, loss[loss=0.1166, beats_loss=0.009192, ecapa_loss=0.0001716, whisper_loss=0.1057, over 14986.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.000154, whisper_loss=0.0901, over 3772667.89 frames. ], batch size: 57, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:46:30,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-08-14 10:46:33,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.304e+01 2.552e+01 2.845e+01 4.485e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 10:46:43,898 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 10:46:49,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2616620.0, ans=0.07 2024-08-14 10:47:01,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2616720.0, ans=0.1 2024-08-14 10:47:05,374 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 10:47:08,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-14 10:47:34,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 850, loss[loss=0.1071, beats_loss=0.009962, ecapa_loss=0.0001362, whisper_loss=0.09579, over 18668.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001541, whisper_loss=0.0899, over 3751707.59 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:47:39,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.31 vs. limit=10.0 2024-08-14 10:48:02,958 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 10:48:04,335 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 10:48:12,399 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 10:48:15,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2024-08-14 10:48:22,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2617320.0, ans=0.0 2024-08-14 10:48:29,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2617420.0, ans=0.0 2024-08-14 10:48:40,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2617420.0, ans=0.2 2024-08-14 10:48:41,292 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 10:48:42,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 900, loss[loss=0.1027, beats_loss=0.01105, ecapa_loss=0.0001133, whisper_loss=0.09055, over 19779.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001527, whisper_loss=0.08988, over 3768944.04 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:48:49,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.308e+01 2.548e+01 2.901e+01 4.285e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-14 10:48:53,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2617520.0, ans=0.125 2024-08-14 10:48:56,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2617620.0, ans=0.015 2024-08-14 10:49:15,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2617720.0, ans=0.125 2024-08-14 10:49:32,375 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 10:49:40,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2617920.0, ans=0.0 2024-08-14 10:49:44,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2617920.0, ans=0.125 2024-08-14 10:49:48,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2618020.0, ans=0.125 2024-08-14 10:49:49,513 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 950, loss[loss=0.1135, beats_loss=0.006324, ecapa_loss=0.0001551, whisper_loss=0.1056, over 17928.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001532, whisper_loss=0.09017, over 3794371.83 frames. ], batch size: 65, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:49:49,732 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 10:49:58,001 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 10:50:01,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-08-14 10:50:02,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2618120.0, ans=0.07 2024-08-14 10:50:07,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-14 10:50:18,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2618220.0, ans=0.125 2024-08-14 10:50:38,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2618320.0, ans=0.125 2024-08-14 10:50:40,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2618320.0, ans=0.0 2024-08-14 10:50:41,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2618320.0, ans=0.035 2024-08-14 10:50:43,777 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 10:50:57,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1000, loss[loss=0.1111, beats_loss=0.0104, ecapa_loss=0.0001186, whisper_loss=0.09955, over 17742.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001513, whisper_loss=0.08958, over 3766628.66 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:50:57,510 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 10:50:58,930 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 10:51:03,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.412e+01 2.681e+01 3.043e+01 1.164e+02, threshold=5.362e+01, percent-clipped=2.0 2024-08-14 10:51:09,007 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 10:51:10,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2618620.0, ans=0.0 2024-08-14 10:51:13,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2618620.0, ans=0.125 2024-08-14 10:51:52,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-14 10:52:00,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-08-14 10:52:03,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1050, loss[loss=0.1143, beats_loss=0.01131, ecapa_loss=0.0001304, whisper_loss=0.1017, over 15439.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001512, whisper_loss=0.08959, over 3791509.05 frames. ], batch size: 59, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:52:09,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-14 10:52:18,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2619120.0, ans=0.125 2024-08-14 10:52:28,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2619120.0, ans=0.0 2024-08-14 10:52:36,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2619220.0, ans=0.125 2024-08-14 10:52:39,763 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 34 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 10:52:42,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2024-08-14 10:52:45,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2619320.0, ans=0.0 2024-08-14 10:52:47,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2619320.0, ans=0.125 2024-08-14 10:52:48,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2619320.0, ans=0.125 2024-08-14 10:53:10,026 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 10:53:11,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1100, loss[loss=0.09454, beats_loss=0.01224, ecapa_loss=0.0001635, whisper_loss=0.08067, over 18195.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001513, whisper_loss=0.08993, over 3800363.36 frames. ], batch size: 79, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:53:17,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.365e+01 2.665e+01 2.962e+01 1.430e+02, threshold=5.329e+01, percent-clipped=2.0 2024-08-14 10:53:19,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2619520.0, ans=0.125 2024-08-14 10:53:20,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2619520.0, ans=0.04949747468305833 2024-08-14 10:53:28,180 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 10:53:28,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2619620.0, ans=0.1 2024-08-14 10:53:52,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2619820.0, ans=0.2 2024-08-14 10:54:17,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1150, loss[loss=0.1086, beats_loss=0.01154, ecapa_loss=0.0001288, whisper_loss=0.09578, over 18229.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001526, whisper_loss=0.09078, over 3833082.77 frames. ], batch size: 70, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:54:48,865 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 10:55:02,234 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 10:55:06,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2620320.0, ans=0.125 2024-08-14 10:55:13,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2620420.0, ans=0.0 2024-08-14 10:55:14,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2620420.0, ans=10.0 2024-08-14 10:55:14,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2024-08-14 10:55:24,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1200, loss[loss=0.09138, beats_loss=0.009257, ecapa_loss=0.0001446, whisper_loss=0.08067, over 16488.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001519, whisper_loss=0.09043, over 3840483.66 frames. ], batch size: 61, lr: 3.33e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:55:31,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.349e+01 2.616e+01 2.854e+01 5.362e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 10:55:40,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-08-14 10:55:49,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2620620.0, ans=0.0 2024-08-14 10:55:51,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2620720.0, ans=0.0 2024-08-14 10:55:55,197 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-14 10:56:04,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2620820.0, ans=0.1 2024-08-14 10:56:12,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2620820.0, ans=0.1 2024-08-14 10:56:24,270 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 10:56:27,948 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 10:56:31,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1250, loss[loss=0.124, beats_loss=0.01043, ecapa_loss=0.0001433, whisper_loss=0.1121, over 20841.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001529, whisper_loss=0.09067, over 3838712.01 frames. ], batch size: 83, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:56:59,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2621220.0, ans=0.2 2024-08-14 10:57:05,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-14 10:57:05,706 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 10:57:15,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2621320.0, ans=0.125 2024-08-14 10:57:39,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1300, loss[loss=0.09686, beats_loss=0.011, ecapa_loss=0.0001785, whisper_loss=0.08408, over 19923.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001531, whisper_loss=0.09103, over 3854748.73 frames. ], batch size: 81, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:57:45,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.286e+01 2.497e+01 2.754e+01 3.684e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 10:57:47,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2621520.0, ans=0.1 2024-08-14 10:57:48,571 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-14 10:57:48,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2621520.0, ans=0.125 2024-08-14 10:57:51,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-14 10:58:03,397 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 10:58:10,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-08-14 10:58:15,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2621720.0, ans=0.125 2024-08-14 10:58:17,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2621720.0, ans=0.125 2024-08-14 10:58:17,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.68 vs. limit=22.5 2024-08-14 10:58:20,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2621820.0, ans=0.125 2024-08-14 10:58:22,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2621820.0, ans=0.125 2024-08-14 10:58:38,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-08-14 10:58:45,681 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 10:58:46,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1350, loss[loss=0.08998, beats_loss=0.01541, ecapa_loss=0.0001338, whisper_loss=0.07324, over 15622.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001509, whisper_loss=0.09043, over 3828555.31 frames. ], batch size: 63, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:58:47,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-08-14 10:59:03,004 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 10:59:10,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-14 10:59:12,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2024-08-14 10:59:26,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=8.0 2024-08-14 10:59:29,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2622320.0, ans=0.0 2024-08-14 10:59:40,044 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 25 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-14 10:59:44,349 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 10:59:46,014 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.463e-03 2024-08-14 10:59:49,575 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 10:59:52,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.83 vs. limit=12.0 2024-08-14 10:59:53,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1400, loss[loss=0.09342, beats_loss=0.01095, ecapa_loss=0.0001232, whisper_loss=0.08123, over 19285.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001501, whisper_loss=0.08976, over 3814569.57 frames. ], batch size: 75, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 10:59:59,297 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.310e-02 2024-08-14 10:59:59,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.361e+01 2.575e+01 2.810e+01 4.774e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-14 11:00:06,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2622620.0, ans=0.2 2024-08-14 11:00:32,669 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 11:00:39,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2622820.0, ans=0.0 2024-08-14 11:00:44,352 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 11:00:45,553 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 11:00:48,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2622920.0, ans=0.125 2024-08-14 11:00:48,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2622920.0, ans=0.125 2024-08-14 11:00:57,636 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 11:00:59,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1450, loss[loss=0.09409, beats_loss=0.009641, ecapa_loss=0.0001713, whisper_loss=0.08273, over 15886.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001495, whisper_loss=0.08916, over 3790784.92 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:01:23,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2623020.0, ans=0.0 2024-08-14 11:01:32,534 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 11:01:34,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2623120.0, ans=0.0 2024-08-14 11:01:48,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2623220.0, ans=0.125 2024-08-14 11:01:49,399 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 11:01:49,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2623220.0, ans=0.0 2024-08-14 11:02:21,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2623420.0, ans=0.0 2024-08-14 11:02:23,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1500, loss[loss=0.08893, beats_loss=0.01387, ecapa_loss=0.0001343, whisper_loss=0.07371, over 23024.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01062, ecapa_loss=0.000151, whisper_loss=0.08882, over 3774265.12 frames. ], batch size: 93, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:02:25,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2623520.0, ans=0.125 2024-08-14 11:02:29,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-14 11:02:30,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.366e+01 2.617e+01 2.967e+01 6.359e+01, threshold=5.234e+01, percent-clipped=3.0 2024-08-14 11:02:31,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2623520.0, ans=0.1 2024-08-14 11:02:46,602 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 11:02:52,641 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 11:02:58,853 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 11:03:06,997 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:03:07,996 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 11:03:10,855 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 10 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 11:03:29,391 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 11:03:37,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1550, loss[loss=0.1144, beats_loss=0.009651, ecapa_loss=0.0001231, whisper_loss=0.1035, over 17706.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01066, ecapa_loss=0.0001491, whisper_loss=0.08907, over 3776900.39 frames. ], batch size: 67, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:03:51,227 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 11:04:02,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2624120.0, ans=0.2 2024-08-14 11:04:09,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2624220.0, ans=0.125 2024-08-14 11:04:12,035 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 11:04:12,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2624220.0, ans=0.0 2024-08-14 11:04:14,856 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 11:04:31,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2624320.0, ans=0.025 2024-08-14 11:04:46,084 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 11:04:54,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1600, loss[loss=0.1042, beats_loss=0.01118, ecapa_loss=0.0001695, whisper_loss=0.09131, over 18440.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.000149, whisper_loss=0.08967, over 3813832.54 frames. ], batch size: 75, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:05:01,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.369e+01 2.524e+01 2.843e+01 4.192e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-14 11:05:02,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2624520.0, ans=0.125 2024-08-14 11:05:15,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2624620.0, ans=0.1 2024-08-14 11:05:23,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-14 11:05:33,413 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 11:05:58,849 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 11:06:10,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1650, loss[loss=0.08163, beats_loss=0.01449, ecapa_loss=0.0001094, whisper_loss=0.06604, over 20365.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001497, whisper_loss=0.08956, over 3829043.72 frames. ], batch size: 82, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:06:11,543 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 11:06:12,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-14 11:06:13,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2625020.0, ans=0.0 2024-08-14 11:06:15,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2625020.0, ans=0.2 2024-08-14 11:06:18,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-14 11:06:19,359 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 11:06:22,272 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 11:06:23,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2625120.0, ans=0.2 2024-08-14 11:06:39,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2625220.0, ans=0.125 2024-08-14 11:06:50,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2625220.0, ans=0.5 2024-08-14 11:06:55,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2625320.0, ans=0.0 2024-08-14 11:07:06,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2625320.0, ans=0.125 2024-08-14 11:07:15,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2625420.0, ans=0.125 2024-08-14 11:07:23,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2625420.0, ans=0.0 2024-08-14 11:07:25,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1700, loss[loss=0.1108, beats_loss=0.01047, ecapa_loss=0.0001497, whisper_loss=0.0988, over 19072.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001502, whisper_loss=0.08996, over 3835541.96 frames. ], batch size: 75, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:07:26,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2625520.0, ans=0.0 2024-08-14 11:07:32,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.263e+01 2.524e+01 2.794e+01 4.972e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 11:07:33,814 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 11:07:51,663 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 11:08:23,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2625820.0, ans=0.125 2024-08-14 11:08:41,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1750, loss[loss=0.08173, beats_loss=0.01207, ecapa_loss=0.0001829, whisper_loss=0.06783, over 12869.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.09082, over 3838467.54 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:08:46,032 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 11:08:47,382 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 11:08:47,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2626020.0, ans=0.0 2024-08-14 11:08:50,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2626020.0, ans=0.125 2024-08-14 11:08:58,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:07,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:09,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2626120.0, ans=0.125 2024-08-14 11:09:13,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2024-08-14 11:09:32,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2626320.0, ans=0.0 2024-08-14 11:09:35,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2626320.0, ans=0.125 2024-08-14 11:09:37,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2626320.0, ans=0.0 2024-08-14 11:09:45,453 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 11:09:51,381 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 11:09:54,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2626520.0, ans=15.0 2024-08-14 11:09:55,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1800, loss[loss=0.1024, beats_loss=0.01148, ecapa_loss=0.0001437, whisper_loss=0.08953, over 19432.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001515, whisper_loss=0.09056, over 3816587.49 frames. ], batch size: 79, lr: 3.32e-03, grad_scale: 1.152921504606847e+18 2024-08-14 11:09:56,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2626520.0, ans=0.1 2024-08-14 11:09:57,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2626520.0, ans=0.1 2024-08-14 11:10:05,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.260e+01 2.552e+01 2.816e+01 4.964e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-14 11:10:20,370 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 11:10:28,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2626720.0, ans=0.125 2024-08-14 11:10:42,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2626820.0, ans=0.125 2024-08-14 11:10:49,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2626820.0, ans=0.0 2024-08-14 11:11:03,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2024-08-14 11:11:11,071 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1850, loss[loss=0.1097, beats_loss=0.009879, ecapa_loss=0.0001698, whisper_loss=0.09817, over 19923.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001518, whisper_loss=0.09064, over 3806449.05 frames. ], batch size: 82, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:11:17,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-08-14 11:11:22,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2627020.0, ans=0.1 2024-08-14 11:11:33,350 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:12:25,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1900, loss[loss=0.1116, beats_loss=0.00842, ecapa_loss=0.0001993, whisper_loss=0.1012, over 16424.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001521, whisper_loss=0.08963, over 3768978.75 frames. ], batch size: 67, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:12:33,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.313e+01 2.505e+01 2.769e+01 4.411e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-14 11:12:48,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2627620.0, ans=0.015 2024-08-14 11:12:54,746 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 11:13:00,701 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 11:13:38,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2628020.0, ans=0.2 2024-08-14 11:13:39,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 1950, loss[loss=0.1028, beats_loss=0.007817, ecapa_loss=0.0001788, whisper_loss=0.09321, over 15968.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.08953, over 3779106.61 frames. ], batch size: 63, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:13:44,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2628020.0, ans=0.1 2024-08-14 11:13:53,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2628120.0, ans=0.125 2024-08-14 11:14:10,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2628220.0, ans=0.1 2024-08-14 11:14:21,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2628220.0, ans=0.125 2024-08-14 11:14:21,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2628220.0, ans=0.125 2024-08-14 11:14:29,054 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 11:14:36,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2628320.0, ans=0.0 2024-08-14 11:14:37,462 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-14 11:14:54,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2000, loss[loss=0.09281, beats_loss=0.01166, ecapa_loss=0.0001165, whisper_loss=0.07998, over 14533.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.08959, over 3785833.00 frames. ], batch size: 54, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:15:04,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.355e+01 2.639e+01 2.929e+01 2.426e+02, threshold=5.277e+01, percent-clipped=2.0 2024-08-14 11:15:20,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2628620.0, ans=0.125 2024-08-14 11:15:30,326 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:15:33,559 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 27 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-14 11:15:35,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2628720.0, ans=0.025 2024-08-14 11:15:37,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2628720.0, ans=0.125 2024-08-14 11:15:39,438 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 11:15:46,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2628820.0, ans=0.1 2024-08-14 11:15:53,833 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 11:15:54,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=12.0 2024-08-14 11:15:55,226 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 11:15:58,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.89 vs. limit=10.0 2024-08-14 11:16:12,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2050, loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001557, whisper_loss=0.0908, over 19787.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001518, whisper_loss=0.09047, over 3815304.24 frames. ], batch size: 80, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:16:14,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2629020.0, ans=0.125 2024-08-14 11:16:29,525 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 11:16:58,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2629320.0, ans=0.2 2024-08-14 11:17:06,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=22.5 2024-08-14 11:17:22,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-14 11:17:24,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2629420.0, ans=0.035 2024-08-14 11:17:30,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2100, loss[loss=0.08716, beats_loss=0.013, ecapa_loss=0.0001427, whisper_loss=0.07274, over 14539.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.0899, over 3823980.33 frames. ], batch size: 59, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:17:36,134 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.694e+00 2024-08-14 11:17:39,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.267e+01 2.457e+01 2.784e+01 3.709e+01, threshold=4.913e+01, percent-clipped=0.0 2024-08-14 11:17:53,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2629620.0, ans=0.125 2024-08-14 11:17:59,306 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09374744445085526, model_norm_threshold=49.13302230834961 2024-08-14 11:17:59,484 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.336e+04, grad_sumsq=7.336e+04, orig_rms_sq=1.000e+00 2024-08-14 11:18:06,253 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 11:18:12,382 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-14 11:18:17,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-14 11:18:23,406 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-14 11:18:30,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2629820.0, ans=0.2 2024-08-14 11:18:35,996 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 9 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 11:18:47,036 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 28 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-14 11:18:49,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2150, loss[loss=0.1042, beats_loss=0.009689, ecapa_loss=0.0001871, whisper_loss=0.09262, over 21389.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.08977, over 3784508.29 frames. ], batch size: 86, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:19:04,370 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 11:19:05,821 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 11:19:09,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2630120.0, ans=0.125 2024-08-14 11:19:34,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-14 11:19:36,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2630320.0, ans=0.125 2024-08-14 11:19:46,768 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-14 11:20:08,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2200, loss[loss=0.07828, beats_loss=0.01297, ecapa_loss=0.0001384, whisper_loss=0.06392, over 15119.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001517, whisper_loss=0.08988, over 3770827.88 frames. ], batch size: 63, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:20:11,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2630520.0, ans=0.2 2024-08-14 11:20:13,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-14 11:20:15,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2630520.0, ans=0.1 2024-08-14 11:20:16,578 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 11:20:17,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.405e+01 2.616e+01 2.970e+01 5.241e+02, threshold=5.232e+01, percent-clipped=2.0 2024-08-14 11:20:21,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2630520.0, ans=0.125 2024-08-14 11:20:29,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2630620.0, ans=0.125 2024-08-14 11:20:34,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2630620.0, ans=0.0 2024-08-14 11:20:54,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2024-08-14 11:20:58,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2630820.0, ans=0.1 2024-08-14 11:21:03,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2024-08-14 11:21:18,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2024-08-14 11:21:18,501 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 11:21:26,099 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 11:21:29,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2250, loss[loss=0.1121, beats_loss=0.01111, ecapa_loss=0.0001375, whisper_loss=0.09966, over 23125.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001515, whisper_loss=0.09101, over 3829853.18 frames. ], batch size: 88, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:21:29,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2631020.0, ans=0.0 2024-08-14 11:21:33,459 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 11:21:42,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-08-14 11:22:08,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2631220.0, ans=0.0 2024-08-14 11:22:13,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2631220.0, ans=0.0 2024-08-14 11:22:38,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2631420.0, ans=0.125 2024-08-14 11:22:49,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2300, loss[loss=0.09662, beats_loss=0.01135, ecapa_loss=0.0001221, whisper_loss=0.08404, over 14331.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001539, whisper_loss=0.09136, over 3840117.89 frames. ], batch size: 57, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:22:57,694 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 11:22:58,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.348e+01 2.609e+01 2.849e+01 2.533e+02, threshold=5.217e+01, percent-clipped=1.0 2024-08-14 11:23:00,830 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 11:23:06,072 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 20 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-14 11:23:36,012 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 11:23:36,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2631820.0, ans=0.2 2024-08-14 11:23:36,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-08-14 11:23:41,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2631820.0, ans=0.125 2024-08-14 11:23:46,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=22.5 2024-08-14 11:23:47,180 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 11:23:58,001 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 11:24:06,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2631920.0, ans=0.07 2024-08-14 11:24:08,787 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2350, loss[loss=0.1014, beats_loss=0.01225, ecapa_loss=0.0001209, whisper_loss=0.08795, over 22569.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.0001541, whisper_loss=0.09206, over 3879538.42 frames. ], batch size: 88, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:24:43,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2632220.0, ans=0.0 2024-08-14 11:24:46,545 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 11:24:46,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2632220.0, ans=0.0 2024-08-14 11:25:00,719 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 11:25:11,653 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 11:25:29,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2400, loss[loss=0.07908, beats_loss=0.01223, ecapa_loss=0.0001478, whisper_loss=0.06537, over 20745.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.0001541, whisper_loss=0.09218, over 3887377.17 frames. ], batch size: 88, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:25:38,059 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.318e+01 2.574e+01 2.948e+01 5.851e+01, threshold=5.149e+01, percent-clipped=1.0 2024-08-14 11:25:39,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=17.03 vs. limit=15.0 2024-08-14 11:25:42,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2632520.0, ans=0.1 2024-08-14 11:25:46,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.66 vs. limit=10.0 2024-08-14 11:25:48,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-14 11:25:53,325 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 11:26:07,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-14 11:26:10,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-14 11:26:10,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2024-08-14 11:26:13,126 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 11:26:17,541 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-14 11:26:19,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2632820.0, ans=0.125 2024-08-14 11:26:21,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2632820.0, ans=0.125 2024-08-14 11:26:22,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2632820.0, ans=0.1 2024-08-14 11:26:28,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2632820.0, ans=0.1 2024-08-14 11:26:31,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2632920.0, ans=0.2 2024-08-14 11:26:37,742 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 11:26:43,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.29 vs. limit=10.0 2024-08-14 11:26:46,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2450, loss[loss=0.1133, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.1013, over 15998.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01051, ecapa_loss=0.0001531, whisper_loss=0.092, over 3871517.14 frames. ], batch size: 64, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:26:48,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-08-14 11:27:04,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2633120.0, ans=0.1 2024-08-14 11:27:15,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.81 vs. limit=6.0 2024-08-14 11:27:18,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2633220.0, ans=0.125 2024-08-14 11:27:30,418 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:27:38,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2633320.0, ans=0.0 2024-08-14 11:27:47,221 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 11:28:03,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2500, loss[loss=0.08655, beats_loss=0.01191, ecapa_loss=0.000164, whisper_loss=0.073, over 21167.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001537, whisper_loss=0.09185, over 3863745.05 frames. ], batch size: 88, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:28:12,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.237e+01 2.442e+01 2.717e+01 4.288e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-14 11:28:23,616 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 11:28:35,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-14 11:28:38,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2024-08-14 11:29:08,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2633920.0, ans=0.125 2024-08-14 11:29:12,733 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 11:29:20,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2550, loss[loss=0.1122, beats_loss=0.01089, ecapa_loss=0.0001147, whisper_loss=0.1002, over 23495.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001536, whisper_loss=0.09178, over 3889423.41 frames. ], batch size: 88, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:29:31,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2634020.0, ans=0.0 2024-08-14 11:29:51,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2634120.0, ans=0.1 2024-08-14 11:30:03,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2634220.0, ans=10.0 2024-08-14 11:30:12,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2634320.0, ans=0.125 2024-08-14 11:30:13,290 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 11:30:32,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2634420.0, ans=0.125 2024-08-14 11:30:40,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2600, loss[loss=0.1065, beats_loss=0.01055, ecapa_loss=0.0001443, whisper_loss=0.0945, over 19427.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001535, whisper_loss=0.09141, over 3880469.96 frames. ], batch size: 73, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:30:42,423 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:30:49,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.475e+01 2.751e+01 3.111e+01 1.109e+02, threshold=5.502e+01, percent-clipped=3.0 2024-08-14 11:30:54,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2634620.0, ans=0.125 2024-08-14 11:31:09,661 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 11:31:21,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-08-14 11:31:21,806 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 11:31:25,138 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 11:31:28,591 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 11:31:51,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2634920.0, ans=0.125 2024-08-14 11:31:57,224 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 11:31:57,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2635020.0, ans=0.125 2024-08-14 11:31:58,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2650, loss[loss=0.1052, beats_loss=0.01114, ecapa_loss=0.0001504, whisper_loss=0.0926, over 20238.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.000154, whisper_loss=0.09069, over 3881430.82 frames. ], batch size: 80, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:32:00,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2635020.0, ans=0.125 2024-08-14 11:32:05,995 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 11:32:11,579 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 18 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-14 11:32:16,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-08-14 11:32:17,877 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 11:32:18,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2635120.0, ans=0.0 2024-08-14 11:32:32,965 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 11:32:33,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2635220.0, ans=0.2 2024-08-14 11:32:38,966 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 16 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 11:33:05,068 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 11:33:07,870 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-14 11:33:13,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2700, loss[loss=0.09511, beats_loss=0.01144, ecapa_loss=0.0001808, whisper_loss=0.08187, over 19982.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.0001541, whisper_loss=0.09059, over 3893606.36 frames. ], batch size: 85, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:33:15,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-08-14 11:33:22,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.370e+01 2.672e+01 3.079e+01 4.287e+01, threshold=5.344e+01, percent-clipped=0.0 2024-08-14 11:33:52,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2635720.0, ans=0.0 2024-08-14 11:33:58,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2635720.0, ans=15.0 2024-08-14 11:34:07,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2635820.0, ans=0.125 2024-08-14 11:34:20,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-14 11:34:20,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2024-08-14 11:34:33,704 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 11:34:35,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2635920.0, ans=0.0 2024-08-14 11:34:37,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2750, loss[loss=0.09951, beats_loss=0.01188, ecapa_loss=0.0001722, whisper_loss=0.08591, over 16881.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01074, ecapa_loss=0.0001539, whisper_loss=0.09014, over 3905036.42 frames. ], batch size: 69, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:34:45,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2636020.0, ans=0.125 2024-08-14 11:35:11,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2636220.0, ans=0.1 2024-08-14 11:35:13,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-14 11:35:24,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2636220.0, ans=0.0 2024-08-14 11:35:45,706 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 11:35:52,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2636420.0, ans=0.1 2024-08-14 11:36:07,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2800, loss[loss=0.05675, beats_loss=0.01369, ecapa_loss=0.0001567, whisper_loss=0.04149, over 13549.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001538, whisper_loss=0.09046, over 3875082.82 frames. ], batch size: 56, lr: 3.32e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:36:19,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.323e+01 2.596e+01 2.984e+01 3.829e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-14 11:36:22,168 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 11:36:23,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2636520.0, ans=0.1 2024-08-14 11:36:34,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2636620.0, ans=0.125 2024-08-14 11:36:53,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-08-14 11:36:56,532 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 11:37:14,298 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 11:37:23,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-14 11:37:41,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2636920.0, ans=0.125 2024-08-14 11:37:48,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2850, loss[loss=0.09232, beats_loss=0.01118, ecapa_loss=0.0001805, whisper_loss=0.07933, over 16131.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001539, whisper_loss=0.09045, over 3860070.23 frames. ], batch size: 70, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:37:52,626 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 11:37:56,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2637020.0, ans=0.125 2024-08-14 11:38:17,718 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-14 11:38:19,911 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 11:39:22,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2637320.0, ans=0.05 2024-08-14 11:39:23,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2637320.0, ans=0.125 2024-08-14 11:39:25,919 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 11:39:40,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2637420.0, ans=0.2 2024-08-14 11:39:50,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2637520.0, ans=0.125 2024-08-14 11:39:51,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2900, loss[loss=0.1039, beats_loss=0.01032, ecapa_loss=0.0001524, whisper_loss=0.09205, over 21949.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01068, ecapa_loss=0.0001556, whisper_loss=0.0914, over 3886070.08 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:40:05,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.270e+01 2.545e+01 2.877e+01 7.977e+01, threshold=5.090e+01, percent-clipped=2.0 2024-08-14 11:40:17,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2637620.0, ans=0.0 2024-08-14 11:40:32,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=22.5 2024-08-14 11:40:36,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-14 11:40:43,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2637720.0, ans=0.125 2024-08-14 11:41:21,879 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 11:41:34,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-14 11:41:47,613 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 11:41:49,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2024-08-14 11:41:57,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 2950, loss[loss=0.113, beats_loss=0.01087, ecapa_loss=0.0001599, whisper_loss=0.1006, over 21823.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001573, whisper_loss=0.09184, over 3894575.39 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:42:10,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2638020.0, ans=0.125 2024-08-14 11:42:11,232 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 11:42:29,290 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 11:42:39,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2638120.0, ans=0.125 2024-08-14 11:43:39,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2638420.0, ans=0.2 2024-08-14 11:43:50,382 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 11:43:56,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3000, loss[loss=0.1152, beats_loss=0.009917, ecapa_loss=0.0001561, whisper_loss=0.1037, over 21062.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01057, ecapa_loss=0.000157, whisper_loss=0.09279, over 3928813.46 frames. ], batch size: 81, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:43:56,127 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 11:44:34,437 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005472, whisper_loss=0.2471, over 922467.00 frames. 2024-08-14 11:44:51,388 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on SV_voxceleb1: loss=0.00425, beats_loss=0, ecapa_loss=0.000425, whisper_loss=0, over 939242.00 frames. 2024-08-14 11:46:48,030 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 11:46:48,034 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 11:46:57,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.515e+01 2.846e+01 3.137e+01 6.212e+01, threshold=5.693e+01, percent-clipped=1.0 2024-08-14 11:47:03,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=12.0 2024-08-14 11:47:08,170 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 11:47:10,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2638620.0, ans=0.125 2024-08-14 11:47:20,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-14 11:47:22,294 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-14 11:47:27,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-14 11:47:46,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2638820.0, ans=0.125 2024-08-14 11:47:49,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2638820.0, ans=0.1 2024-08-14 11:48:07,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3050, loss[loss=0.1156, beats_loss=0.009752, ecapa_loss=0.0001656, whisper_loss=0.1042, over 21559.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01054, ecapa_loss=0.0001578, whisper_loss=0.09326, over 3927990.25 frames. ], batch size: 84, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:48:09,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2639020.0, ans=0.0 2024-08-14 11:48:15,831 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 11:48:29,278 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 11:49:16,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2639420.0, ans=0.1 2024-08-14 11:49:30,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3100, loss[loss=0.09759, beats_loss=0.01005, ecapa_loss=0.0001402, whisper_loss=0.08614, over 19330.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01058, ecapa_loss=0.0001587, whisper_loss=0.09258, over 3916683.21 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:49:34,635 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 11:49:35,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2639520.0, ans=0.1 2024-08-14 11:49:38,201 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 11:49:39,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.342e+01 2.628e+01 3.036e+01 4.820e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-14 11:49:47,756 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 30 from Vox, 20 fro AS 2024-08-14 11:50:17,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2639820.0, ans=0.05 2024-08-14 11:50:31,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2639920.0, ans=0.125 2024-08-14 11:50:41,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2639920.0, ans=0.025 2024-08-14 11:50:49,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3150, loss[loss=0.1124, beats_loss=0.01275, ecapa_loss=0.000166, whisper_loss=0.09795, over 17368.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01063, ecapa_loss=0.0001584, whisper_loss=0.09234, over 3889268.24 frames. ], batch size: 70, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:50:59,942 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 11:51:13,241 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 11:51:20,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=12.0 2024-08-14 11:51:23,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2640220.0, ans=0.1 2024-08-14 11:51:33,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.94 vs. limit=22.5 2024-08-14 11:51:39,315 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-14 11:51:44,479 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 11:51:47,395 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 11:51:48,867 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 11:51:52,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2640420.0, ans=0.125 2024-08-14 11:51:55,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2640420.0, ans=0.0 2024-08-14 11:51:56,657 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-14 11:52:06,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3200, loss[loss=0.1214, beats_loss=0.009443, ecapa_loss=0.0001533, whisper_loss=0.1104, over 21757.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001589, whisper_loss=0.0921, over 3869480.46 frames. ], batch size: 84, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:52:16,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.359e+01 2.591e+01 2.913e+01 5.020e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-14 11:52:19,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2640520.0, ans=0.1 2024-08-14 11:52:38,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2640720.0, ans=0.1 2024-08-14 11:53:05,734 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-14 11:53:09,527 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 11:53:22,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3250, loss[loss=0.09871, beats_loss=0.009681, ecapa_loss=0.000186, whisper_loss=0.08717, over 21526.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01062, ecapa_loss=0.0001586, whisper_loss=0.09252, over 3909700.70 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:53:35,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2641020.0, ans=0.2 2024-08-14 11:53:41,460 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 11:53:43,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2024-08-14 11:54:27,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-14 11:54:39,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2024-08-14 11:54:44,007 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3300, loss[loss=0.08991, beats_loss=0.0146, ecapa_loss=0.0001417, whisper_loss=0.07389, over 20308.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01061, ecapa_loss=0.0001587, whisper_loss=0.09217, over 3883046.63 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:54:46,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2641520.0, ans=0.125 2024-08-14 11:54:50,490 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 11:54:53,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2641520.0, ans=0.125 2024-08-14 11:54:54,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.343e+01 2.686e+01 3.135e+01 1.274e+02, threshold=5.372e+01, percent-clipped=3.0 2024-08-14 11:55:10,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2641620.0, ans=0.2 2024-08-14 11:55:30,028 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 11:55:33,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2641820.0, ans=0.5 2024-08-14 11:55:45,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2641820.0, ans=0.125 2024-08-14 11:55:55,943 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 11:55:59,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2641920.0, ans=0.125 2024-08-14 11:55:59,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2024-08-14 11:56:04,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3350, loss[loss=0.09817, beats_loss=0.01009, ecapa_loss=0.000131, whisper_loss=0.08678, over 22316.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01054, ecapa_loss=0.0001576, whisper_loss=0.09251, over 3882024.04 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 5.764607523034235e+17 2024-08-14 11:56:07,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2642020.0, ans=0.125 2024-08-14 11:56:10,750 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 11:56:59,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-14 11:57:01,370 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 11:57:04,868 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 11:57:23,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3400, loss[loss=0.1021, beats_loss=0.01161, ecapa_loss=0.0001081, whisper_loss=0.08939, over 18728.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001574, whisper_loss=0.09172, over 3878079.14 frames. ], batch size: 72, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:57:28,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2642520.0, ans=15.0 2024-08-14 11:57:34,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.486e+01 2.836e+01 3.327e+01 1.695e+02, threshold=5.673e+01, percent-clipped=4.0 2024-08-14 11:57:37,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-08-14 11:57:44,171 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 11:58:14,055 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 11:58:43,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3450, loss[loss=0.1208, beats_loss=0.009264, ecapa_loss=0.0001783, whisper_loss=0.1098, over 15760.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01052, ecapa_loss=0.000159, whisper_loss=0.09203, over 3906822.65 frames. ], batch size: 62, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 11:58:57,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2643020.0, ans=0.2 2024-08-14 11:59:08,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2643120.0, ans=0.0 2024-08-14 11:59:08,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2643120.0, ans=0.0 2024-08-14 11:59:11,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2643120.0, ans=0.125 2024-08-14 11:59:22,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2643220.0, ans=0.05 2024-08-14 11:59:39,539 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 11:59:55,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2643420.0, ans=0.125 2024-08-14 11:59:58,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2643420.0, ans=0.0 2024-08-14 12:00:03,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-14 12:00:03,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3500, loss[loss=0.1053, beats_loss=0.01009, ecapa_loss=0.0001632, whisper_loss=0.09359, over 16804.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01052, ecapa_loss=0.0001587, whisper_loss=0.09242, over 3881634.87 frames. ], batch size: 68, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:00:10,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2643520.0, ans=0.2 2024-08-14 12:00:16,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.311e+01 2.583e+01 2.814e+01 3.893e+01, threshold=5.167e+01, percent-clipped=0.0 2024-08-14 12:00:42,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2643720.0, ans=0.125 2024-08-14 12:01:07,440 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 12:01:08,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-14 12:01:09,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2643920.0, ans=0.125 2024-08-14 12:01:20,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2643920.0, ans=0.0 2024-08-14 12:01:25,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2644020.0, ans=0.0 2024-08-14 12:01:26,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2024-08-14 12:01:26,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3550, loss[loss=0.07549, beats_loss=0.01309, ecapa_loss=0.0001384, whisper_loss=0.06101, over 15893.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001582, whisper_loss=0.09173, over 3888998.80 frames. ], batch size: 64, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:01:38,626 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 12:01:46,793 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 12:02:01,838 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-14 12:02:18,297 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:02:40,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2644420.0, ans=0.0 2024-08-14 12:02:42,264 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 12:02:48,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2644420.0, ans=0.0 2024-08-14 12:02:51,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3600, loss[loss=0.09955, beats_loss=0.01103, ecapa_loss=0.000203, whisper_loss=0.08649, over 15678.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001584, whisper_loss=0.09161, over 3862081.16 frames. ], batch size: 67, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:03:01,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.481e+01 2.628e+01 2.850e+01 4.421e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-14 12:03:09,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2644620.0, ans=0.125 2024-08-14 12:03:09,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2644620.0, ans=0.1 2024-08-14 12:03:10,473 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 12:03:16,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-14 12:03:22,881 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 12:03:47,814 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 12:04:08,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3650, loss[loss=0.1191, beats_loss=0.008943, ecapa_loss=0.0001797, whisper_loss=0.1083, over 17186.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.000158, whisper_loss=0.0907, over 3837222.47 frames. ], batch size: 66, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:04:17,715 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-14 12:04:20,311 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 12:04:27,071 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 12:04:50,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2645220.0, ans=0.0 2024-08-14 12:05:05,501 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 12:05:13,228 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.727e+01 2024-08-14 12:05:24,168 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3700, loss[loss=0.09744, beats_loss=0.009928, ecapa_loss=0.0001816, whisper_loss=0.0857, over 19174.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001572, whisper_loss=0.09159, over 3853221.03 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:05:33,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.298e+01 2.531e+01 2.738e+01 1.071e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-14 12:05:36,047 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 12:05:45,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2024-08-14 12:05:50,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2645620.0, ans=0.1 2024-08-14 12:06:01,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2645720.0, ans=0.1 2024-08-14 12:06:02,457 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 12:06:08,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2645820.0, ans=0.2 2024-08-14 12:06:16,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.99 vs. limit=10.0 2024-08-14 12:06:19,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=12.0 2024-08-14 12:06:26,489 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 12:06:39,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3750, loss[loss=0.08788, beats_loss=0.0116, ecapa_loss=0.000169, whisper_loss=0.07459, over 22573.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001575, whisper_loss=0.09092, over 3849312.49 frames. ], batch size: 96, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:06:41,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2646020.0, ans=0.125 2024-08-14 12:06:48,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2646020.0, ans=0.0 2024-08-14 12:06:59,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2646120.0, ans=0.1 2024-08-14 12:07:10,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2646220.0, ans=0.2 2024-08-14 12:07:25,215 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-14 12:07:48,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2646420.0, ans=0.2 2024-08-14 12:07:55,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2646520.0, ans=0.125 2024-08-14 12:07:56,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3800, loss[loss=0.08478, beats_loss=0.01165, ecapa_loss=0.0001301, whisper_loss=0.07183, over 18776.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001579, whisper_loss=0.09098, over 3863197.41 frames. ], batch size: 72, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:08:05,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.378e+01 2.670e+01 2.953e+01 4.426e+01, threshold=5.341e+01, percent-clipped=0.0 2024-08-14 12:08:47,535 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 12:09:04,728 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 12:09:06,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2646920.0, ans=0.125 2024-08-14 12:09:08,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2646920.0, ans=0.0 2024-08-14 12:09:14,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3850, loss[loss=0.1093, beats_loss=0.007636, ecapa_loss=0.0001853, whisper_loss=0.09978, over 21771.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.000158, whisper_loss=0.09124, over 3853113.26 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:09:40,202 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 36 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 12:09:41,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2647120.0, ans=0.1 2024-08-14 12:09:47,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2647220.0, ans=0.125 2024-08-14 12:09:49,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2647220.0, ans=0.2 2024-08-14 12:09:57,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2647220.0, ans=0.125 2024-08-14 12:10:00,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 12:10:27,394 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 12:10:35,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3900, loss[loss=0.1109, beats_loss=0.01174, ecapa_loss=0.0001397, whisper_loss=0.09776, over 23575.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01066, ecapa_loss=0.0001579, whisper_loss=0.09131, over 3872956.74 frames. ], batch size: 94, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:10:48,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.360e+01 2.691e+01 2.914e+01 3.544e+02, threshold=5.383e+01, percent-clipped=1.0 2024-08-14 12:11:14,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=22.5 2024-08-14 12:11:15,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2647720.0, ans=0.125 2024-08-14 12:11:17,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2024-08-14 12:11:22,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2647720.0, ans=0.1 2024-08-14 12:11:27,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=12.0 2024-08-14 12:11:39,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2647820.0, ans=0.1 2024-08-14 12:11:39,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2647820.0, ans=0.125 2024-08-14 12:12:05,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 3950, loss[loss=0.08306, beats_loss=0.01401, ecapa_loss=0.0001206, whisper_loss=0.06784, over 21913.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01068, ecapa_loss=0.0001578, whisper_loss=0.09189, over 3925252.33 frames. ], batch size: 87, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:12:10,447 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.230e-01 2024-08-14 12:12:14,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2648020.0, ans=0.0 2024-08-14 12:13:09,141 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 12:13:23,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2648320.0, ans=0.125 2024-08-14 12:13:52,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4000, loss[loss=0.1161, beats_loss=0.01086, ecapa_loss=0.0001538, whisper_loss=0.1037, over 18396.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01055, ecapa_loss=0.0001589, whisper_loss=0.09233, over 3924351.95 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:13:57,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2648520.0, ans=0.125 2024-08-14 12:13:59,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2648520.0, ans=0.05 2024-08-14 12:14:01,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2648520.0, ans=0.0 2024-08-14 12:14:02,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-08-14 12:14:07,568 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.464e+01 2.683e+01 2.941e+01 4.279e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-14 12:14:35,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2648620.0, ans=0.125 2024-08-14 12:14:45,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=22.5 2024-08-14 12:14:47,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2648720.0, ans=0.1 2024-08-14 12:14:52,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2648720.0, ans=0.0 2024-08-14 12:14:56,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-14 12:15:25,696 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 12:15:52,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4050, loss[loss=0.1035, beats_loss=0.01218, ecapa_loss=0.0001443, whisper_loss=0.08991, over 23514.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01052, ecapa_loss=0.0001584, whisper_loss=0.09292, over 3939508.00 frames. ], batch size: 96, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:15:59,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2649020.0, ans=0.0 2024-08-14 12:16:28,025 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 12:16:48,372 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 12:16:55,866 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 12:17:00,559 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 12:17:17,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-14 12:17:20,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4100, loss[loss=0.1295, beats_loss=0.0102, ecapa_loss=0.0001183, whisper_loss=0.1182, over 15669.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01053, ecapa_loss=0.0001584, whisper_loss=0.0926, over 3928019.26 frames. ], batch size: 57, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:17:30,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2649520.0, ans=0.125 2024-08-14 12:17:33,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.297e+01 2.541e+01 2.897e+01 6.382e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-14 12:17:43,401 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 12:18:06,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2024-08-14 12:18:07,324 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 12:18:34,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2649920.0, ans=0.125 2024-08-14 12:18:46,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2649920.0, ans=0.125 2024-08-14 12:18:52,090 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 12:18:53,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4150, loss[loss=0.103, beats_loss=0.01159, ecapa_loss=0.000149, whisper_loss=0.08996, over 23440.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01064, ecapa_loss=0.0001591, whisper_loss=0.09184, over 3919072.07 frames. ], batch size: 95, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:18:56,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2650020.0, ans=0.0 2024-08-14 12:19:01,755 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 12:19:06,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=12.0 2024-08-14 12:19:15,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2650120.0, ans=0.0 2024-08-14 12:19:15,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-08-14 12:19:17,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2650120.0, ans=0.1 2024-08-14 12:19:24,197 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 12:19:31,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2650220.0, ans=0.1 2024-08-14 12:19:33,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2650220.0, ans=0.125 2024-08-14 12:19:39,068 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 12:19:59,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2650420.0, ans=0.04949747468305833 2024-08-14 12:20:04,013 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 12:20:07,237 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 12:20:16,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4200, loss[loss=0.08075, beats_loss=0.01416, ecapa_loss=0.0001532, whisper_loss=0.06506, over 21048.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001568, whisper_loss=0.09132, over 3901953.54 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:20:16,818 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 28 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 12:20:25,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-08-14 12:20:27,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.390e+01 2.581e+01 2.872e+01 4.290e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-14 12:20:32,862 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 12:20:41,887 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 12:20:44,053 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:20:58,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2650720.0, ans=0.0 2024-08-14 12:20:58,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2650720.0, ans=0.0 2024-08-14 12:21:01,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2650720.0, ans=0.125 2024-08-14 12:21:15,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2650820.0, ans=0.0 2024-08-14 12:21:34,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2650920.0, ans=0.1 2024-08-14 12:21:36,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4250, loss[loss=0.1007, beats_loss=0.009599, ecapa_loss=0.000126, whisper_loss=0.08986, over 16766.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001563, whisper_loss=0.09083, over 3911760.15 frames. ], batch size: 60, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:21:44,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2651020.0, ans=0.0 2024-08-14 12:21:50,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2651020.0, ans=0.05 2024-08-14 12:22:05,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2651120.0, ans=0.2 2024-08-14 12:22:17,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-14 12:22:27,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2651320.0, ans=0.0 2024-08-14 12:22:33,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-08-14 12:22:37,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2651320.0, ans=0.0 2024-08-14 12:22:49,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2651420.0, ans=0.0 2024-08-14 12:22:57,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4300, loss[loss=0.1073, beats_loss=0.009029, ecapa_loss=0.000185, whisper_loss=0.09645, over 22293.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01072, ecapa_loss=0.0001567, whisper_loss=0.09141, over 3898425.70 frames. ], batch size: 93, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:23:07,915 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 12:23:08,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.461e+01 2.630e+01 3.002e+01 3.746e+02, threshold=5.260e+01, percent-clipped=1.0 2024-08-14 12:23:19,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2651620.0, ans=0.0 2024-08-14 12:23:27,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2651620.0, ans=0.0 2024-08-14 12:23:46,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2651820.0, ans=0.125 2024-08-14 12:23:51,497 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 12:24:15,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4350, loss[loss=0.09892, beats_loss=0.01162, ecapa_loss=0.0001347, whisper_loss=0.08596, over 22995.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001569, whisper_loss=0.0912, over 3880757.10 frames. ], batch size: 94, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:24:17,554 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 12:24:28,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2652020.0, ans=0.125 2024-08-14 12:24:32,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2652120.0, ans=0.0 2024-08-14 12:24:37,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-14 12:24:47,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2652220.0, ans=0.0 2024-08-14 12:24:49,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2652220.0, ans=0.125 2024-08-14 12:24:50,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2652220.0, ans=0.1 2024-08-14 12:25:08,814 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 12:25:10,245 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-14 12:25:15,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2652420.0, ans=0.125 2024-08-14 12:25:22,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2652420.0, ans=0.1 2024-08-14 12:25:30,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4400, loss[loss=0.1357, beats_loss=0.007141, ecapa_loss=0.0001895, whisper_loss=0.1267, over 20610.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01069, ecapa_loss=0.0001565, whisper_loss=0.09156, over 3892698.69 frames. ], batch size: 78, lr: 3.31e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:25:31,000 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 12:25:40,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.397e+01 2.574e+01 2.948e+01 5.281e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-14 12:25:41,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2652520.0, ans=0.1 2024-08-14 12:25:54,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2652620.0, ans=0.125 2024-08-14 12:25:55,405 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 12:26:01,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2652720.0, ans=0.125 2024-08-14 12:26:04,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2652720.0, ans=0.1 2024-08-14 12:26:24,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2652820.0, ans=0.125 2024-08-14 12:26:43,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4450, loss[loss=0.1152, beats_loss=0.009214, ecapa_loss=0.000155, whisper_loss=0.1044, over 23073.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.000156, whisper_loss=0.09111, over 3884702.67 frames. ], batch size: 93, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:26:47,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2653020.0, ans=0.125 2024-08-14 12:26:54,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2653020.0, ans=0.0 2024-08-14 12:26:57,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2653120.0, ans=0.125 2024-08-14 12:27:01,797 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 12:27:07,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2653120.0, ans=0.125 2024-08-14 12:27:10,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2653120.0, ans=0.0 2024-08-14 12:27:12,972 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 12:27:19,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2653220.0, ans=0.2 2024-08-14 12:27:30,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2653320.0, ans=0.0 2024-08-14 12:27:52,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-08-14 12:27:57,765 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4500, loss[loss=0.09703, beats_loss=0.01218, ecapa_loss=0.0001347, whisper_loss=0.0835, over 22594.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001563, whisper_loss=0.09097, over 3889271.84 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:28:08,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+01 2.296e+01 2.547e+01 2.865e+01 4.084e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-14 12:28:28,758 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-14 12:28:29,119 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 12:28:30,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2653720.0, ans=0.2 2024-08-14 12:28:48,347 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 12:29:13,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4550, loss[loss=0.121, beats_loss=0.009907, ecapa_loss=0.0001472, whisper_loss=0.1096, over 22687.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001571, whisper_loss=0.09106, over 3891250.97 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:29:17,235 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 12:29:21,868 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 28 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-14 12:29:53,654 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 12:30:04,250 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 18 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 12:30:05,574 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-14 12:30:20,030 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 12:30:22,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2654420.0, ans=0.2 2024-08-14 12:30:29,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4600, loss[loss=0.1057, beats_loss=0.01072, ecapa_loss=0.0001388, whisper_loss=0.0936, over 20447.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001568, whisper_loss=0.09067, over 3883988.82 frames. ], batch size: 79, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:30:34,598 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.518e-01 2024-08-14 12:30:40,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.344e+01 2.580e+01 2.840e+01 1.542e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-14 12:30:47,708 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 12:31:05,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2654720.0, ans=0.125 2024-08-14 12:31:16,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2654820.0, ans=0.1 2024-08-14 12:31:21,734 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-14 12:31:34,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2654920.0, ans=0.125 2024-08-14 12:31:40,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2654920.0, ans=0.2 2024-08-14 12:31:40,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2654920.0, ans=0.0 2024-08-14 12:31:46,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2655020.0, ans=15.0 2024-08-14 12:31:46,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4650, loss[loss=0.09245, beats_loss=0.01294, ecapa_loss=0.0001705, whisper_loss=0.07781, over 20459.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001584, whisper_loss=0.0904, over 3870128.32 frames. ], batch size: 87, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:31:49,132 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-14 12:31:55,483 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 12:32:14,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-14 12:32:37,728 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-14 12:32:54,287 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 12:32:56,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-14 12:33:13,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4700, loss[loss=0.09956, beats_loss=0.009844, ecapa_loss=0.0001292, whisper_loss=0.08843, over 14982.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.000158, whisper_loss=0.09079, over 3860051.44 frames. ], batch size: 54, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:33:25,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.291e+01 2.499e+01 2.871e+01 5.538e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-14 12:33:34,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2655620.0, ans=0.125 2024-08-14 12:33:36,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-08-14 12:33:56,603 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 14 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 12:34:03,384 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 12:34:23,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2655920.0, ans=0.2 2024-08-14 12:34:27,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2655920.0, ans=0.0 2024-08-14 12:34:37,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4750, loss[loss=0.102, beats_loss=0.01181, ecapa_loss=0.0001574, whisper_loss=0.08857, over 21511.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001569, whisper_loss=0.09015, over 3890489.93 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:34:40,981 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 12:34:54,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2656120.0, ans=0.0 2024-08-14 12:35:35,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2656420.0, ans=0.0 2024-08-14 12:35:40,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2656420.0, ans=0.0 2024-08-14 12:35:51,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4800, loss[loss=0.12, beats_loss=0.008816, ecapa_loss=0.0002288, whisper_loss=0.1089, over 16807.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001576, whisper_loss=0.09041, over 3914090.11 frames. ], batch size: 71, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:35:53,332 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 12:36:02,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.391e+01 2.624e+01 2.971e+01 4.050e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-14 12:36:24,345 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 12:36:46,572 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-14 12:37:00,117 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 12:37:01,604 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 12:37:05,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4850, loss[loss=0.122, beats_loss=0.007809, ecapa_loss=0.0001727, whisper_loss=0.1124, over 19019.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01084, ecapa_loss=0.0001553, whisper_loss=0.09011, over 3929550.74 frames. ], batch size: 72, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:37:16,407 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 12:37:29,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2657120.0, ans=0.05 2024-08-14 12:37:36,484 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 12:37:38,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2657220.0, ans=0.1 2024-08-14 12:38:20,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4900, loss[loss=0.07732, beats_loss=0.01176, ecapa_loss=0.0001719, whisper_loss=0.06384, over 18631.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01086, ecapa_loss=0.0001543, whisper_loss=0.09052, over 3917742.38 frames. ], batch size: 78, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:38:31,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.386e+01 2.578e+01 2.812e+01 7.156e+01, threshold=5.157e+01, percent-clipped=2.0 2024-08-14 12:38:32,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2657520.0, ans=0.09899494936611666 2024-08-14 12:38:54,764 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-14 12:39:15,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-14 12:39:18,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-08-14 12:39:36,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 4950, loss[loss=0.1045, beats_loss=0.01102, ecapa_loss=0.0001701, whisper_loss=0.0918, over 22379.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001562, whisper_loss=0.09065, over 3894562.66 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:39:39,787 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 12:40:00,447 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-14 12:40:13,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2658220.0, ans=0.1 2024-08-14 12:40:39,282 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 12:40:40,317 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 12:40:47,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2658420.0, ans=0.0 2024-08-14 12:40:49,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2658520.0, ans=0.0 2024-08-14 12:40:50,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5000, loss[loss=0.1133, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.1011, over 22814.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.09114, over 3877152.92 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:41:01,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.269e+01 2.546e+01 2.965e+01 4.784e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 12:41:03,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-14 12:41:12,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2658620.0, ans=0.125 2024-08-14 12:41:13,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2658620.0, ans=0.0 2024-08-14 12:41:21,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2658720.0, ans=0.015 2024-08-14 12:41:37,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2024-08-14 12:41:41,794 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 12:42:00,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=15.0 2024-08-14 12:42:04,310 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 12:42:05,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5050, loss[loss=0.0999, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.08777, over 20425.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01071, ecapa_loss=0.000155, whisper_loss=0.09189, over 3908738.93 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:42:13,540 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-14 12:42:21,631 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.095e+01 2024-08-14 12:42:23,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2659120.0, ans=0.125 2024-08-14 12:42:33,397 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 12:42:39,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2024-08-14 12:42:40,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2659220.0, ans=0.2 2024-08-14 12:42:49,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2659320.0, ans=0.04949747468305833 2024-08-14 12:42:53,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2659320.0, ans=0.125 2024-08-14 12:42:57,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2659320.0, ans=0.1 2024-08-14 12:42:58,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2024-08-14 12:43:03,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2659320.0, ans=0.125 2024-08-14 12:43:04,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2659320.0, ans=0.125 2024-08-14 12:43:21,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5100, loss[loss=0.1107, beats_loss=0.01038, ecapa_loss=0.0001504, whisper_loss=0.09884, over 22948.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01066, ecapa_loss=0.0001549, whisper_loss=0.09284, over 3935989.25 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:43:25,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-14 12:43:28,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2659520.0, ans=0.04949747468305833 2024-08-14 12:43:32,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.354e+01 2.635e+01 2.968e+01 4.253e+01, threshold=5.269e+01, percent-clipped=0.0 2024-08-14 12:43:35,402 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-14 12:43:54,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2659720.0, ans=0.04949747468305833 2024-08-14 12:44:06,264 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 12:44:16,577 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 12:44:24,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2659920.0, ans=0.0 2024-08-14 12:44:28,933 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 12:44:36,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5150, loss[loss=0.09209, beats_loss=0.01094, ecapa_loss=0.0001412, whisper_loss=0.07973, over 19898.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01069, ecapa_loss=0.0001543, whisper_loss=0.09256, over 3921903.57 frames. ], batch size: 81, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:44:39,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2660020.0, ans=0.0 2024-08-14 12:45:13,088 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.740e+05 2024-08-14 12:45:23,313 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 12:45:29,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2660320.0, ans=0.1 2024-08-14 12:45:40,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2660420.0, ans=0.1 2024-08-14 12:45:51,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5200, loss[loss=0.09676, beats_loss=0.01033, ecapa_loss=0.0001501, whisper_loss=0.08493, over 16770.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01067, ecapa_loss=0.0001553, whisper_loss=0.0922, over 3904308.20 frames. ], batch size: 68, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:45:58,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2660520.0, ans=0.125 2024-08-14 12:46:02,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.363e+01 2.791e+01 3.410e+01 2.422e+02, threshold=5.583e+01, percent-clipped=4.0 2024-08-14 12:46:06,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2660620.0, ans=0.0 2024-08-14 12:46:12,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2660620.0, ans=0.125 2024-08-14 12:46:24,201 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 12:46:28,370 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 12:46:57,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.81 vs. limit=10.0 2024-08-14 12:46:59,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2660920.0, ans=0.0 2024-08-14 12:47:06,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5250, loss[loss=0.1045, beats_loss=0.009323, ecapa_loss=0.0001747, whisper_loss=0.09346, over 22011.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001562, whisper_loss=0.09192, over 3912487.17 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:47:18,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2661020.0, ans=0.2 2024-08-14 12:47:32,538 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 12:47:40,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2661220.0, ans=0.0 2024-08-14 12:47:41,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2661220.0, ans=0.125 2024-08-14 12:47:44,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2661220.0, ans=0.05 2024-08-14 12:47:53,611 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 12:47:53,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2661320.0, ans=0.0 2024-08-14 12:47:58,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-14 12:48:09,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2661420.0, ans=0.125 2024-08-14 12:48:10,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2661420.0, ans=0.5 2024-08-14 12:48:19,002 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-14 12:48:20,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5300, loss[loss=0.1062, beats_loss=0.008303, ecapa_loss=0.0001823, whisper_loss=0.09603, over 13612.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01065, ecapa_loss=0.0001561, whisper_loss=0.09234, over 3908067.80 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:48:29,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.292e+01 2.528e+01 2.841e+01 9.142e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-14 12:48:35,949 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 12:48:56,663 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 12:48:58,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2661720.0, ans=0.1 2024-08-14 12:49:00,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2661720.0, ans=0.2 2024-08-14 12:49:04,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2661820.0, ans=0.125 2024-08-14 12:49:04,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2661820.0, ans=0.125 2024-08-14 12:49:16,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2661820.0, ans=0.1 2024-08-14 12:49:22,247 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 12:49:25,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2661920.0, ans=0.025 2024-08-14 12:49:32,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2662020.0, ans=0.0 2024-08-14 12:49:33,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5350, loss[loss=0.09169, beats_loss=0.01227, ecapa_loss=0.0001381, whisper_loss=0.07803, over 17911.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001546, whisper_loss=0.09196, over 3896161.27 frames. ], batch size: 69, lr: 3.30e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 12:49:44,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2662020.0, ans=0.0 2024-08-14 12:49:45,734 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07321541011333466, model_norm_threshold=50.560279846191406 2024-08-14 12:49:45,935 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.720e+04, grad_sumsq=6.720e+04, orig_rms_sq=1.000e+00 2024-08-14 12:50:24,030 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 12:50:37,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2662420.0, ans=0.09899494936611666 2024-08-14 12:50:48,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5400, loss[loss=0.1064, beats_loss=0.007083, ecapa_loss=0.0001669, whisper_loss=0.09767, over 16524.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01057, ecapa_loss=0.0001545, whisper_loss=0.09256, over 3877194.34 frames. ], batch size: 58, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:50:54,518 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 12:50:58,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.310e+01 2.504e+01 2.679e+01 6.906e+02, threshold=5.009e+01, percent-clipped=1.0 2024-08-14 12:51:01,376 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 12:51:18,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-14 12:51:49,901 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 12:51:50,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2662920.0, ans=0.0 2024-08-14 12:51:52,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2662920.0, ans=0.125 2024-08-14 12:51:55,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2662920.0, ans=0.125 2024-08-14 12:52:00,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5450, loss[loss=0.1049, beats_loss=0.01081, ecapa_loss=0.0001627, whisper_loss=0.09242, over 17284.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01059, ecapa_loss=0.000155, whisper_loss=0.09263, over 3888819.62 frames. ], batch size: 69, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:52:24,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2663120.0, ans=0.2 2024-08-14 12:52:45,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2663320.0, ans=0.0 2024-08-14 12:53:01,216 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 12:53:10,309 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 12:53:11,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-14 12:53:14,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5500, loss[loss=0.1236, beats_loss=0.009893, ecapa_loss=0.0001733, whisper_loss=0.112, over 22012.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01066, ecapa_loss=0.0001549, whisper_loss=0.09225, over 3924694.40 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:53:24,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.510e+01 2.706e+01 3.048e+01 6.260e+01, threshold=5.412e+01, percent-clipped=1.0 2024-08-14 12:53:29,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2663620.0, ans=0.0 2024-08-14 12:53:41,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2663620.0, ans=0.125 2024-08-14 12:53:43,093 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-14 12:53:44,334 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 12:53:48,618 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-14 12:53:50,365 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 12:53:55,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2663720.0, ans=0.0 2024-08-14 12:54:14,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2663920.0, ans=0.0 2024-08-14 12:54:16,869 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 12:54:20,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2663920.0, ans=0.125 2024-08-14 12:54:23,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-14 12:54:26,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2663920.0, ans=0.2 2024-08-14 12:54:28,840 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5550, loss[loss=0.1144, beats_loss=0.01107, ecapa_loss=0.0001496, whisper_loss=0.1018, over 22838.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001552, whisper_loss=0.09226, over 3923364.84 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:54:48,342 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 12:54:48,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2664120.0, ans=0.5 2024-08-14 12:55:02,002 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 12:55:06,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2664220.0, ans=0.1 2024-08-14 12:55:10,690 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 12:55:12,403 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-14 12:55:23,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2664320.0, ans=0.0 2024-08-14 12:55:41,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2664420.0, ans=0.125 2024-08-14 12:55:43,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5600, loss[loss=0.0834, beats_loss=0.01285, ecapa_loss=0.0001707, whisper_loss=0.06884, over 20794.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01074, ecapa_loss=0.0001554, whisper_loss=0.09192, over 3913517.15 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:55:54,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.335e+01 2.676e+01 3.034e+01 3.132e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-14 12:55:55,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2024-08-14 12:55:59,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2664620.0, ans=0.0 2024-08-14 12:56:07,667 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 12:56:24,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2664720.0, ans=0.125 2024-08-14 12:56:29,933 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 12:56:57,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5650, loss[loss=0.1156, beats_loss=0.01018, ecapa_loss=0.0001446, whisper_loss=0.104, over 17936.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001564, whisper_loss=0.0921, over 3933447.14 frames. ], batch size: 70, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:56:58,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2665020.0, ans=0.125 2024-08-14 12:56:58,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-14 12:57:10,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2665020.0, ans=0.1 2024-08-14 12:57:18,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-14 12:57:21,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2665120.0, ans=0.125 2024-08-14 12:57:25,823 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 12:57:37,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2665220.0, ans=0.125 2024-08-14 12:58:06,211 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 12:58:07,744 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 12:58:10,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5700, loss[loss=0.1155, beats_loss=0.01094, ecapa_loss=0.0001203, whisper_loss=0.1033, over 17415.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0107, ecapa_loss=0.0001561, whisper_loss=0.09187, over 3961136.34 frames. ], batch size: 64, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:58:16,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2665520.0, ans=0.0 2024-08-14 12:58:20,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.440e+01 2.658e+01 3.007e+01 5.166e+01, threshold=5.317e+01, percent-clipped=0.0 2024-08-14 12:58:47,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2665720.0, ans=0.0 2024-08-14 12:59:07,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-14 12:59:16,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2665920.0, ans=0.025 2024-08-14 12:59:25,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5750, loss[loss=0.1093, beats_loss=0.008146, ecapa_loss=0.0001665, whisper_loss=0.09948, over 16153.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01069, ecapa_loss=0.0001556, whisper_loss=0.0918, over 3942008.08 frames. ], batch size: 63, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 12:59:30,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2666020.0, ans=0.125 2024-08-14 12:59:36,785 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.331e-02 2024-08-14 12:59:45,493 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 12:59:46,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2666120.0, ans=0.2 2024-08-14 12:59:47,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2666120.0, ans=0.0 2024-08-14 12:59:50,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=12.0 2024-08-14 12:59:54,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2666220.0, ans=0.0 2024-08-14 13:00:02,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2666220.0, ans=0.0 2024-08-14 13:00:08,941 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 13:00:09,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2666320.0, ans=0.2 2024-08-14 13:00:34,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2666420.0, ans=0.025 2024-08-14 13:00:36,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2666420.0, ans=0.0 2024-08-14 13:00:40,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5800, loss[loss=0.1087, beats_loss=0.008081, ecapa_loss=0.0001977, whisper_loss=0.0986, over 19132.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001561, whisper_loss=0.09088, over 3891604.80 frames. ], batch size: 76, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:00:47,798 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 13:00:50,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.671e+01 3.010e+01 5.088e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-14 13:01:02,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2666620.0, ans=0.125 2024-08-14 13:01:04,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2666620.0, ans=0.125 2024-08-14 13:01:27,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-14 13:01:28,366 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 13:01:37,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2666820.0, ans=0.125 2024-08-14 13:01:47,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2666920.0, ans=0.0 2024-08-14 13:01:51,790 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 13:01:54,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5850, loss[loss=0.1118, beats_loss=0.009184, ecapa_loss=0.0001658, whisper_loss=0.101, over 20890.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001556, whisper_loss=0.09096, over 3873543.97 frames. ], batch size: 85, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:01:55,132 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 13:01:59,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2667020.0, ans=0.0 2024-08-14 13:02:03,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=10.0 2024-08-14 13:02:07,544 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 13:02:36,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2667220.0, ans=0.0 2024-08-14 13:02:39,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2667320.0, ans=0.1 2024-08-14 13:02:46,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2667320.0, ans=0.125 2024-08-14 13:02:48,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2667320.0, ans=0.125 2024-08-14 13:02:51,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2667320.0, ans=0.0 2024-08-14 13:02:56,773 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-14 13:03:08,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2667520.0, ans=0.0 2024-08-14 13:03:08,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5900, loss[loss=0.08642, beats_loss=0.01269, ecapa_loss=0.000146, whisper_loss=0.07227, over 19405.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.0001555, whisper_loss=0.09086, over 3857308.45 frames. ], batch size: 80, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:03:16,524 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 13:03:18,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.335e+01 2.608e+01 2.996e+01 4.185e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 13:03:27,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2667620.0, ans=0.07 2024-08-14 13:03:47,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2667720.0, ans=0.1 2024-08-14 13:03:49,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2667720.0, ans=0.0 2024-08-14 13:03:52,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2667820.0, ans=0.0 2024-08-14 13:04:17,191 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 13:04:22,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 5950, loss[loss=0.09718, beats_loss=0.01308, ecapa_loss=0.000148, whisper_loss=0.08262, over 22343.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01081, ecapa_loss=0.0001557, whisper_loss=0.09011, over 3861105.19 frames. ], batch size: 92, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:04:28,760 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 13:04:43,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2668120.0, ans=0.0 2024-08-14 13:04:58,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2668220.0, ans=0.1 2024-08-14 13:05:35,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2668420.0, ans=0.09899494936611666 2024-08-14 13:05:37,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6000, loss[loss=0.09757, beats_loss=0.009119, ecapa_loss=0.000163, whisper_loss=0.08682, over 18581.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01085, ecapa_loss=0.0001547, whisper_loss=0.09004, over 3872862.65 frames. ], batch size: 73, lr: 3.30e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:05:37,211 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 13:06:13,770 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000548, whisper_loss=0.2476, over 922467.00 frames. 2024-08-14 13:06:29,948 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on SV_voxceleb1: loss=0.004318, beats_loss=0, ecapa_loss=0.0004318, whisper_loss=0, over 939242.00 frames. 2024-08-14 13:08:18,983 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on AT_audioset: loss=0.02353, beats_loss=0.02353, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 13:08:18,992 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 13:08:27,047 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:08:29,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.205e+01 2.512e+01 2.812e+01 4.887e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-14 13:08:44,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2668620.0, ans=0.125 2024-08-14 13:09:02,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2668820.0, ans=0.125 2024-08-14 13:09:10,763 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0662800669670105, model_norm_threshold=50.23476028442383 2024-08-14 13:09:10,967 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.141e+07, orig_rms_sq=9.876e-03 2024-08-14 13:09:16,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2668820.0, ans=0.0 2024-08-14 13:09:19,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2668920.0, ans=0.1 2024-08-14 13:09:34,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6050, loss[loss=0.09203, beats_loss=0.01018, ecapa_loss=0.0002136, whisper_loss=0.07971, over 21178.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001547, whisper_loss=0.09058, over 3857630.79 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:09:34,330 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 13:09:40,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-14 13:09:47,495 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 37 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 13:09:50,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2669120.0, ans=0.0 2024-08-14 13:09:53,721 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 13:09:59,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2669120.0, ans=0.0 2024-08-14 13:10:02,631 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 13:10:10,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2669220.0, ans=0.2 2024-08-14 13:10:15,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2669220.0, ans=0.0 2024-08-14 13:10:15,491 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-14 13:10:22,345 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 13:10:27,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2669320.0, ans=0.125 2024-08-14 13:10:42,683 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-14 13:10:47,663 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 13:10:48,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6100, loss[loss=0.09494, beats_loss=0.009945, ecapa_loss=0.0001518, whisper_loss=0.08347, over 15390.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01078, ecapa_loss=0.0001566, whisper_loss=0.09074, over 3876070.14 frames. ], batch size: 59, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:10:59,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.399e+01 2.783e+01 3.218e+01 7.579e+02, threshold=5.567e+01, percent-clipped=5.0 2024-08-14 13:11:07,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2669620.0, ans=0.0 2024-08-14 13:11:09,795 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 13:11:25,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669720.0, ans=0.1 2024-08-14 13:11:33,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2669820.0, ans=0.2 2024-08-14 13:11:59,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2669920.0, ans=0.1 2024-08-14 13:12:03,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2670020.0, ans=0.125 2024-08-14 13:12:03,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2670020.0, ans=0.125 2024-08-14 13:12:04,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6150, loss[loss=0.1114, beats_loss=0.00946, ecapa_loss=0.0001725, whisper_loss=0.1002, over 23006.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001576, whisper_loss=0.09065, over 3898495.13 frames. ], batch size: 94, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:12:11,734 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 13:12:32,063 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 13:12:36,643 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 13:12:43,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-14 13:12:59,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2670320.0, ans=0.2 2024-08-14 13:13:02,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-14 13:13:08,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2670420.0, ans=0.125 2024-08-14 13:13:18,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6200, loss[loss=0.1099, beats_loss=0.01031, ecapa_loss=0.0001531, whisper_loss=0.0981, over 22872.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0108, ecapa_loss=0.0001563, whisper_loss=0.09082, over 3901913.13 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:13:21,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2670520.0, ans=0.125 2024-08-14 13:13:22,933 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 13:13:28,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.359e+01 2.589e+01 2.919e+01 1.541e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 13:13:31,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-14 13:13:37,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2670620.0, ans=0.0 2024-08-14 13:13:41,000 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-14 13:13:45,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2670620.0, ans=0.125 2024-08-14 13:13:47,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-08-14 13:13:52,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2670720.0, ans=0.1 2024-08-14 13:13:55,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2670720.0, ans=0.1 2024-08-14 13:13:57,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2670720.0, ans=0.0 2024-08-14 13:14:32,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6250, loss[loss=0.125, beats_loss=0.01049, ecapa_loss=0.0001608, whisper_loss=0.1129, over 20430.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01079, ecapa_loss=0.0001557, whisper_loss=0.091, over 3934764.33 frames. ], batch size: 80, lr: 3.29e-03, grad_scale: 5.764607523034235e+17 2024-08-14 13:14:32,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2671020.0, ans=0.2 2024-08-14 13:14:46,991 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 13:14:55,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=8.0 2024-08-14 13:14:57,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2671120.0, ans=0.125 2024-08-14 13:15:04,996 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 13:15:11,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2671220.0, ans=0.1 2024-08-14 13:15:21,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2671320.0, ans=0.015 2024-08-14 13:15:34,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2671420.0, ans=0.125 2024-08-14 13:15:35,761 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 13:15:36,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2671420.0, ans=0.09899494936611666 2024-08-14 13:15:41,525 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 13:15:45,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6300, loss[loss=0.09937, beats_loss=0.009353, ecapa_loss=0.0001941, whisper_loss=0.08808, over 22001.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001563, whisper_loss=0.09096, over 3929282.91 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:15:45,673 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 32 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 13:15:50,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-08-14 13:15:56,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2671520.0, ans=0.125 2024-08-14 13:15:57,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.285e+01 2.511e+01 2.818e+01 8.993e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-14 13:16:05,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2671620.0, ans=0.1 2024-08-14 13:16:09,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-14 13:16:12,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2671620.0, ans=0.0 2024-08-14 13:16:13,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2671720.0, ans=0.1 2024-08-14 13:16:18,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2671720.0, ans=0.1 2024-08-14 13:16:19,568 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 13:16:30,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2024-08-14 13:16:32,789 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 16 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 13:16:55,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2024-08-14 13:16:59,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2672020.0, ans=0.07 2024-08-14 13:17:00,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6350, loss[loss=0.1243, beats_loss=0.009563, ecapa_loss=0.0001322, whisper_loss=0.1135, over 22901.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001563, whisper_loss=0.09124, over 3931576.35 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:17:00,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2672020.0, ans=0.5 2024-08-14 13:17:17,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2672120.0, ans=0.125 2024-08-14 13:17:18,653 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 13:17:54,270 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 13:18:10,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2672420.0, ans=0.0 2024-08-14 13:18:14,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6400, loss[loss=0.09283, beats_loss=0.0107, ecapa_loss=0.0001402, whisper_loss=0.08073, over 16860.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001554, whisper_loss=0.09105, over 3919794.85 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:18:16,069 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 13:18:25,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.344e+01 2.584e+01 2.860e+01 4.850e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-14 13:18:26,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2672520.0, ans=0.2 2024-08-14 13:18:37,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2672620.0, ans=0.125 2024-08-14 13:19:01,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-14 13:19:08,726 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 13:19:09,135 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.652e-02 2024-08-14 13:19:13,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2672920.0, ans=10.0 2024-08-14 13:19:18,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2672920.0, ans=0.0 2024-08-14 13:19:28,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6450, loss[loss=0.118, beats_loss=0.007592, ecapa_loss=0.000187, whisper_loss=0.1085, over 21424.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001562, whisper_loss=0.09163, over 3936861.82 frames. ], batch size: 86, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:19:29,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2673020.0, ans=0.07 2024-08-14 13:19:41,905 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 13:19:45,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2024-08-14 13:19:51,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2673120.0, ans=0.0 2024-08-14 13:20:02,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2673220.0, ans=0.0 2024-08-14 13:20:12,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2673320.0, ans=0.125 2024-08-14 13:20:16,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2673320.0, ans=0.2 2024-08-14 13:20:22,935 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 13:20:27,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2673420.0, ans=0.125 2024-08-14 13:20:39,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-14 13:20:41,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6500, loss[loss=0.0969, beats_loss=0.01077, ecapa_loss=0.00014, whisper_loss=0.08473, over 15848.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01075, ecapa_loss=0.0001553, whisper_loss=0.09125, over 3932465.47 frames. ], batch size: 62, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:20:52,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-14 13:20:53,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.410e+01 2.661e+01 2.982e+01 1.028e+02, threshold=5.322e+01, percent-clipped=1.0 2024-08-14 13:21:20,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2673720.0, ans=0.125 2024-08-14 13:21:24,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2673820.0, ans=0.125 2024-08-14 13:21:27,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2673820.0, ans=0.125 2024-08-14 13:21:40,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-08-14 13:21:49,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2673920.0, ans=0.025 2024-08-14 13:21:55,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6550, loss[loss=0.1205, beats_loss=0.009497, ecapa_loss=0.0001371, whisper_loss=0.1096, over 14611.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01079, ecapa_loss=0.0001543, whisper_loss=0.0912, over 3943507.33 frames. ], batch size: 54, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:21:56,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2674020.0, ans=0.1 2024-08-14 13:22:09,576 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.463e+00 2024-08-14 13:22:14,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2674120.0, ans=0.0 2024-08-14 13:22:18,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2674120.0, ans=0.125 2024-08-14 13:22:27,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2674220.0, ans=0.0 2024-08-14 13:22:34,054 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 13:22:39,741 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 13:22:47,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2674320.0, ans=0.1 2024-08-14 13:22:51,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2674320.0, ans=0.1 2024-08-14 13:22:51,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2674320.0, ans=0.07 2024-08-14 13:22:55,464 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 13:23:05,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-14 13:23:06,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2674420.0, ans=0.0 2024-08-14 13:23:08,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6600, loss[loss=0.1372, beats_loss=0.007365, ecapa_loss=0.0001532, whisper_loss=0.1283, over 23587.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.0926, over 3956643.93 frames. ], batch size: 87, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:23:08,896 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 13:23:20,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.440e+01 2.726e+01 3.181e+01 5.119e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-14 13:23:21,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2674520.0, ans=0.2 2024-08-14 13:23:22,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2674620.0, ans=0.2 2024-08-14 13:23:37,019 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-14 13:24:10,100 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 13:24:11,420 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 13:24:14,592 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 15 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-14 13:24:21,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6650, loss[loss=0.113, beats_loss=0.008675, ecapa_loss=0.0001782, whisper_loss=0.1026, over 22704.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01063, ecapa_loss=0.0001566, whisper_loss=0.0923, over 3960891.39 frames. ], batch size: 88, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:24:28,146 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 13:24:33,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-08-14 13:24:36,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2675120.0, ans=0.125 2024-08-14 13:24:39,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-14 13:24:39,918 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 13:24:46,901 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 13:24:50,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2675220.0, ans=0.125 2024-08-14 13:24:55,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2675220.0, ans=0.125 2024-08-14 13:25:26,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-08-14 13:25:33,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2675420.0, ans=0.0 2024-08-14 13:25:35,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6700, loss[loss=0.1142, beats_loss=0.01148, ecapa_loss=0.0001112, whisper_loss=0.1016, over 21738.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0106, ecapa_loss=0.0001563, whisper_loss=0.09259, over 3950813.29 frames. ], batch size: 80, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:25:40,368 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 13:25:47,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.392e+01 2.630e+01 2.889e+01 1.018e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-14 13:25:48,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2675520.0, ans=0.125 2024-08-14 13:25:49,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2675620.0, ans=0.125 2024-08-14 13:26:09,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2675720.0, ans=0.125 2024-08-14 13:26:12,681 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 13:26:14,183 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 13:26:26,083 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 13:26:36,681 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 13:26:38,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.47 vs. limit=10.0 2024-08-14 13:26:49,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6750, loss[loss=0.1133, beats_loss=0.009948, ecapa_loss=0.0001632, whisper_loss=0.1018, over 23021.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01062, ecapa_loss=0.0001553, whisper_loss=0.09215, over 3942739.29 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:26:59,309 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 13:26:59,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2676020.0, ans=0.125 2024-08-14 13:27:12,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2676120.0, ans=0.125 2024-08-14 13:27:34,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2676320.0, ans=0.1 2024-08-14 13:27:34,695 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.408e-01 2024-08-14 13:27:59,716 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 13:28:02,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6800, loss[loss=0.09235, beats_loss=0.009968, ecapa_loss=0.0001877, whisper_loss=0.08051, over 22138.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001574, whisper_loss=0.09151, over 3942881.19 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:28:13,461 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 36 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 13:28:14,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.378e+01 2.676e+01 3.043e+01 8.013e+01, threshold=5.353e+01, percent-clipped=1.0 2024-08-14 13:28:22,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2676620.0, ans=0.05 2024-08-14 13:28:36,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2676720.0, ans=0.5 2024-08-14 13:28:51,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2676820.0, ans=0.0 2024-08-14 13:28:58,087 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 13:29:04,169 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-14 13:29:05,607 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 13:29:14,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2676920.0, ans=0.125 2024-08-14 13:29:17,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6850, loss[loss=0.1043, beats_loss=0.01024, ecapa_loss=0.0001666, whisper_loss=0.0924, over 22589.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001568, whisper_loss=0.09126, over 3911985.01 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:29:33,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=12.0 2024-08-14 13:29:42,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677120.0, ans=0.1 2024-08-14 13:29:56,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2677220.0, ans=0.125 2024-08-14 13:30:17,275 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 15 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 13:30:18,903 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 13:30:19,262 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.202e+05 2024-08-14 13:30:23,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2677420.0, ans=0.1 2024-08-14 13:30:25,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677420.0, ans=0.1 2024-08-14 13:30:27,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2677520.0, ans=0.125 2024-08-14 13:30:28,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6900, loss[loss=0.1027, beats_loss=0.01147, ecapa_loss=0.0001468, whisper_loss=0.08978, over 16964.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001557, whisper_loss=0.09076, over 3896233.41 frames. ], batch size: 67, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:30:29,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2677520.0, ans=0.125 2024-08-14 13:30:36,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2677520.0, ans=0.1 2024-08-14 13:30:39,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.298e+01 2.502e+01 2.840e+01 6.631e+01, threshold=5.005e+01, percent-clipped=1.0 2024-08-14 13:30:47,173 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-14 13:30:47,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-14 13:30:58,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2677720.0, ans=0.2 2024-08-14 13:31:02,576 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 13:31:05,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2677720.0, ans=0.125 2024-08-14 13:31:14,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677820.0, ans=0.1 2024-08-14 13:31:18,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=2677820.0, ans=15.0 2024-08-14 13:31:19,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2677820.0, ans=0.125 2024-08-14 13:31:32,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2677920.0, ans=0.1 2024-08-14 13:31:36,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2677920.0, ans=0.1 2024-08-14 13:31:39,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 6950, loss[loss=0.1219, beats_loss=0.007992, ecapa_loss=0.0001797, whisper_loss=0.1121, over 17106.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001555, whisper_loss=0.0918, over 3903514.08 frames. ], batch size: 70, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:31:47,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=15.0 2024-08-14 13:31:47,920 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 13:31:51,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2678020.0, ans=0.125 2024-08-14 13:32:14,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-08-14 13:32:17,165 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 13:32:23,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2678320.0, ans=0.125 2024-08-14 13:32:50,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7000, loss[loss=0.09784, beats_loss=0.0125, ecapa_loss=0.0001231, whisper_loss=0.08411, over 23918.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001548, whisper_loss=0.09135, over 3899036.59 frames. ], batch size: 94, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:32:52,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2678520.0, ans=0.125 2024-08-14 13:32:54,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2678520.0, ans=0.04949747468305833 2024-08-14 13:33:01,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.255e+01 2.474e+01 2.854e+01 4.338e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-14 13:33:08,354 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 13:33:09,652 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 13:33:11,034 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 13:33:18,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2678720.0, ans=0.0 2024-08-14 13:33:20,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2678720.0, ans=0.1 2024-08-14 13:33:29,194 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-14 13:33:46,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2678920.0, ans=0.1 2024-08-14 13:33:47,820 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 13:33:50,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2024-08-14 13:33:53,432 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-14 13:33:59,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2678920.0, ans=0.125 2024-08-14 13:34:01,702 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7050, loss[loss=0.1198, beats_loss=0.00997, ecapa_loss=0.0001562, whisper_loss=0.1083, over 18350.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001561, whisper_loss=0.09133, over 3922787.72 frames. ], batch size: 70, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:34:05,985 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 13:34:14,469 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 13:34:26,085 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 13:34:29,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-14 13:34:55,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2679320.0, ans=0.2 2024-08-14 13:34:56,721 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 13:34:57,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2679320.0, ans=0.0 2024-08-14 13:35:05,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2679420.0, ans=0.0 2024-08-14 13:35:13,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7100, loss[loss=0.1224, beats_loss=0.009072, ecapa_loss=0.0001549, whisper_loss=0.1118, over 23863.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001543, whisper_loss=0.09117, over 3927855.24 frames. ], batch size: 91, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:35:20,852 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 13:35:22,341 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 13:35:24,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.302e+01 2.502e+01 2.737e+01 3.925e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-14 13:35:42,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2679720.0, ans=0.125 2024-08-14 13:35:46,153 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 13:35:47,366 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 13:35:49,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2679720.0, ans=0.1 2024-08-14 13:36:00,302 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 13:36:02,920 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-14 13:36:03,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2679820.0, ans=0.5 2024-08-14 13:36:06,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2679820.0, ans=0.5 2024-08-14 13:36:14,317 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 13:36:30,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7150, loss[loss=0.1092, beats_loss=0.009407, ecapa_loss=0.0001692, whisper_loss=0.09814, over 21901.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001543, whisper_loss=0.09133, over 3890680.14 frames. ], batch size: 87, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:36:39,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2680020.0, ans=0.0 2024-08-14 13:36:55,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-14 13:37:01,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2680120.0, ans=0.0 2024-08-14 13:37:07,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-08-14 13:37:23,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2680320.0, ans=0.125 2024-08-14 13:37:28,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2680320.0, ans=0.125 2024-08-14 13:37:36,896 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 13:37:44,474 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 13:37:47,346 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 13:37:52,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7200, loss[loss=0.1093, beats_loss=0.01098, ecapa_loss=0.0001441, whisper_loss=0.09692, over 22960.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001541, whisper_loss=0.09171, over 3910498.95 frames. ], batch size: 92, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:37:53,003 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-14 13:38:04,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.341e+01 2.648e+01 2.948e+01 9.250e+01, threshold=5.295e+01, percent-clipped=2.0 2024-08-14 13:38:06,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2680620.0, ans=0.125 2024-08-14 13:38:27,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2680720.0, ans=0.125 2024-08-14 13:38:44,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-14 13:38:46,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2680820.0, ans=0.95 2024-08-14 13:39:06,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2681020.0, ans=0.1 2024-08-14 13:39:07,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7250, loss[loss=0.09065, beats_loss=0.01135, ecapa_loss=0.000111, whisper_loss=0.07819, over 17383.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.000155, whisper_loss=0.09196, over 3949339.58 frames. ], batch size: 67, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:39:10,821 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 13:39:20,193 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-14 13:39:26,295 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 13:39:42,497 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 13:39:46,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-08-14 13:40:04,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2681320.0, ans=0.125 2024-08-14 13:40:04,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2681320.0, ans=0.125 2024-08-14 13:40:10,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2681420.0, ans=0.0 2024-08-14 13:40:10,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2681420.0, ans=0.125 2024-08-14 13:40:17,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2681420.0, ans=0.0 2024-08-14 13:40:19,753 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 13:40:21,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7300, loss[loss=0.1101, beats_loss=0.009668, ecapa_loss=0.0001461, whisper_loss=0.09901, over 16588.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01067, ecapa_loss=0.0001551, whisper_loss=0.09262, over 3959489.23 frames. ], batch size: 64, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:40:33,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.289e+01 2.573e+01 2.951e+01 1.378e+02, threshold=5.146e+01, percent-clipped=1.0 2024-08-14 13:40:41,058 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 13:40:42,496 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-14 13:40:51,386 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-14 13:40:55,718 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 12 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 13:41:08,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2024-08-14 13:41:18,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2681820.0, ans=0.2 2024-08-14 13:41:27,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2681920.0, ans=0.0 2024-08-14 13:41:30,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2681920.0, ans=0.0 2024-08-14 13:41:31,995 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 13:41:36,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7350, loss[loss=0.08995, beats_loss=0.01218, ecapa_loss=0.0001484, whisper_loss=0.07629, over 20632.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001553, whisper_loss=0.0921, over 3956512.51 frames. ], batch size: 83, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:41:41,360 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 13:41:43,978 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 13:41:53,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2682120.0, ans=0.125 2024-08-14 13:41:56,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-14 13:42:10,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2682220.0, ans=15.0 2024-08-14 13:42:16,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2682220.0, ans=0.0 2024-08-14 13:42:21,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2682320.0, ans=0.125 2024-08-14 13:42:30,276 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 13:42:34,605 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-14 13:42:36,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-14 13:42:45,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2682420.0, ans=0.2 2024-08-14 13:42:49,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2682520.0, ans=0.1 2024-08-14 13:42:50,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7400, loss[loss=0.09037, beats_loss=0.01178, ecapa_loss=0.0001371, whisper_loss=0.07722, over 16189.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09196, over 3953524.78 frames. ], batch size: 64, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:42:55,241 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-14 13:43:01,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2682520.0, ans=0.125 2024-08-14 13:43:02,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.321e+01 2.551e+01 2.887e+01 1.021e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 13:43:07,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2682620.0, ans=0.2 2024-08-14 13:43:30,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2682720.0, ans=0.2 2024-08-14 13:43:38,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-14 13:44:01,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2683020.0, ans=0.125 2024-08-14 13:44:02,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7450, loss[loss=0.103, beats_loss=0.009412, ecapa_loss=0.0001951, whisper_loss=0.09164, over 21919.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.09147, over 3915465.65 frames. ], batch size: 93, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:44:06,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-14 13:44:34,766 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-14 13:44:42,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2683220.0, ans=0.09899494936611666 2024-08-14 13:44:42,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.62 vs. limit=22.5 2024-08-14 13:45:03,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2683420.0, ans=0.125 2024-08-14 13:45:16,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7500, loss[loss=0.08739, beats_loss=0.009061, ecapa_loss=0.0001774, whisper_loss=0.07656, over 16081.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001566, whisper_loss=0.09173, over 3906275.15 frames. ], batch size: 65, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:45:21,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2683520.0, ans=0.125 2024-08-14 13:45:28,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.296e+01 2.546e+01 2.865e+01 4.082e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-14 13:45:47,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2683720.0, ans=0.125 2024-08-14 13:46:22,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2683920.0, ans=0.025 2024-08-14 13:46:27,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2683920.0, ans=0.125 2024-08-14 13:46:32,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7550, loss[loss=0.1073, beats_loss=0.009451, ecapa_loss=0.0001189, whisper_loss=0.09666, over 17839.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01054, ecapa_loss=0.0001563, whisper_loss=0.09216, over 3868703.83 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:46:34,412 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 13:46:42,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2024-08-14 13:46:46,316 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-14 13:46:52,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2684120.0, ans=0.125 2024-08-14 13:47:04,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-14 13:47:45,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2684520.0, ans=0.0 2024-08-14 13:47:46,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7600, loss[loss=0.1226, beats_loss=0.009591, ecapa_loss=0.0001775, whisper_loss=0.1113, over 22634.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0106, ecapa_loss=0.0001556, whisper_loss=0.09188, over 3885954.30 frames. ], batch size: 89, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:47:53,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2684520.0, ans=0.0 2024-08-14 13:47:58,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.371e+01 2.546e+01 2.782e+01 5.094e+01, threshold=5.091e+01, percent-clipped=1.0 2024-08-14 13:48:11,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2684620.0, ans=0.125 2024-08-14 13:48:12,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2684620.0, ans=0.0 2024-08-14 13:48:18,538 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.495e+01 2024-08-14 13:48:27,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2684720.0, ans=15.0 2024-08-14 13:48:31,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2684820.0, ans=0.125 2024-08-14 13:48:32,848 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 13:48:40,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2684820.0, ans=0.0 2024-08-14 13:49:00,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7650, loss[loss=0.101, beats_loss=0.01005, ecapa_loss=0.0001718, whisper_loss=0.08919, over 19124.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01053, ecapa_loss=0.000155, whisper_loss=0.09192, over 3891176.76 frames. ], batch size: 79, lr: 3.29e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:49:02,411 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-14 13:49:05,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2685020.0, ans=0.025 2024-08-14 13:49:11,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2685020.0, ans=0.125 2024-08-14 13:49:24,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2685120.0, ans=0.125 2024-08-14 13:49:39,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2685220.0, ans=0.1 2024-08-14 13:49:41,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2024-08-14 13:49:45,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2685320.0, ans=0.125 2024-08-14 13:49:46,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2685320.0, ans=0.2 2024-08-14 13:50:01,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2685420.0, ans=0.0 2024-08-14 13:50:07,149 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 13:50:08,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2685420.0, ans=0.125 2024-08-14 13:50:13,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7700, loss[loss=0.1057, beats_loss=0.008723, ecapa_loss=0.0001519, whisper_loss=0.09546, over 22707.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.000154, whisper_loss=0.09134, over 3896479.45 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:50:21,942 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 13:50:25,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+01 2.371e+01 2.640e+01 3.039e+01 4.657e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-14 13:50:26,259 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 13:50:27,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2685620.0, ans=0.1 2024-08-14 13:50:31,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-14 13:50:35,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2685620.0, ans=0.125 2024-08-14 13:50:36,499 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 13:50:54,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2685720.0, ans=0.1 2024-08-14 13:51:26,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7750, loss[loss=0.1159, beats_loss=0.0108, ecapa_loss=0.0001347, whisper_loss=0.1038, over 22909.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001532, whisper_loss=0.0909, over 3875987.13 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:51:39,100 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 13:51:41,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2686120.0, ans=0.0 2024-08-14 13:51:51,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2024-08-14 13:51:57,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2686220.0, ans=0.0 2024-08-14 13:52:03,027 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 36 from Vox, 33 fro AS 2024-08-14 13:52:13,077 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 13:52:26,705 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 13:52:37,961 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 13:52:40,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7800, loss[loss=0.1309, beats_loss=0.008538, ecapa_loss=0.000152, whisper_loss=0.1208, over 22414.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001539, whisper_loss=0.09126, over 3898540.44 frames. ], batch size: 85, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:52:47,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2686520.0, ans=0.125 2024-08-14 13:52:52,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.425e+01 2.611e+01 2.883e+01 9.855e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-14 13:53:01,352 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 13:53:04,539 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 13:53:18,695 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 13:53:21,511 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-14 13:53:28,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2686820.0, ans=0.2 2024-08-14 13:53:36,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.67 vs. limit=15.0 2024-08-14 13:53:41,217 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-14 13:53:45,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2686920.0, ans=0.2 2024-08-14 13:53:52,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2686920.0, ans=0.1 2024-08-14 13:53:54,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7850, loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001425, whisper_loss=0.09231, over 23704.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001542, whisper_loss=0.09102, over 3910846.09 frames. ], batch size: 94, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:53:55,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2024-08-14 13:54:17,854 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 13:54:21,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2687120.0, ans=0.09899494936611666 2024-08-14 13:54:25,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2687220.0, ans=0.2 2024-08-14 13:54:32,967 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 13:54:33,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2687220.0, ans=0.0 2024-08-14 13:54:42,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=15.0 2024-08-14 13:54:48,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2687320.0, ans=0.035 2024-08-14 13:54:54,289 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-14 13:54:59,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2687420.0, ans=0.125 2024-08-14 13:55:06,459 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 13:55:08,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7900, loss[loss=0.08595, beats_loss=0.01085, ecapa_loss=0.0001332, whisper_loss=0.07376, over 15041.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01075, ecapa_loss=0.0001538, whisper_loss=0.09123, over 3894868.51 frames. ], batch size: 57, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:55:11,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2687520.0, ans=0.0 2024-08-14 13:55:20,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.378e+01 2.612e+01 2.895e+01 1.059e+02, threshold=5.225e+01, percent-clipped=1.0 2024-08-14 13:55:21,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2024-08-14 13:55:33,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2687620.0, ans=0.125 2024-08-14 13:55:39,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2687720.0, ans=0.125 2024-08-14 13:55:42,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2687720.0, ans=0.125 2024-08-14 13:55:52,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2687820.0, ans=0.125 2024-08-14 13:56:02,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2687820.0, ans=0.125 2024-08-14 13:56:14,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2687920.0, ans=0.2 2024-08-14 13:56:22,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 7950, loss[loss=0.1085, beats_loss=0.01047, ecapa_loss=0.0001756, whisper_loss=0.09631, over 22684.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001545, whisper_loss=0.09094, over 3914774.47 frames. ], batch size: 92, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:56:31,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2688020.0, ans=0.125 2024-08-14 13:56:51,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2688220.0, ans=0.0 2024-08-14 13:57:32,037 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 13:57:37,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8000, loss[loss=0.1033, beats_loss=0.01014, ecapa_loss=0.0001622, whisper_loss=0.09156, over 19480.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001541, whisper_loss=0.09096, over 3886172.86 frames. ], batch size: 79, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:57:48,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2688520.0, ans=0.2 2024-08-14 13:57:48,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.382e+01 2.629e+01 3.053e+01 3.860e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 13:57:58,217 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 13:57:58,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2688620.0, ans=0.0 2024-08-14 13:58:07,430 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 13:58:14,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2688720.0, ans=0.0 2024-08-14 13:58:18,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=12.0 2024-08-14 13:58:50,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8050, loss[loss=0.1102, beats_loss=0.009836, ecapa_loss=0.0001763, whisper_loss=0.09865, over 20742.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01076, ecapa_loss=0.0001545, whisper_loss=0.09071, over 3887634.28 frames. ], batch size: 83, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 13:59:14,755 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 13:59:40,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-14 14:00:03,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8100, loss[loss=0.09172, beats_loss=0.008967, ecapa_loss=0.0002215, whisper_loss=0.08054, over 15683.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001553, whisper_loss=0.09121, over 3898952.15 frames. ], batch size: 68, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:00:14,627 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-14 14:00:15,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.342e+01 2.614e+01 2.950e+01 9.116e+01, threshold=5.228e+01, percent-clipped=3.0 2024-08-14 14:00:17,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2689620.0, ans=0.125 2024-08-14 14:00:22,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-14 14:00:46,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2689820.0, ans=0.2 2024-08-14 14:00:55,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-14 14:00:57,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-14 14:01:09,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2689920.0, ans=0.125 2024-08-14 14:01:13,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2689920.0, ans=0.125 2024-08-14 14:01:15,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2690020.0, ans=0.2 2024-08-14 14:01:15,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8150, loss[loss=0.09511, beats_loss=0.01148, ecapa_loss=0.0001358, whisper_loss=0.08227, over 22114.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0001545, whisper_loss=0.09085, over 3899344.79 frames. ], batch size: 90, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:01:27,977 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 14:01:29,609 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 14:01:33,842 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 14:01:36,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2690120.0, ans=0.125 2024-08-14 14:01:49,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2690220.0, ans=0.0 2024-08-14 14:01:59,772 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 14:02:04,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-14 14:02:13,414 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-14 14:02:29,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8200, loss[loss=0.1013, beats_loss=0.01209, ecapa_loss=0.0001525, whisper_loss=0.08768, over 19323.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.0001556, whisper_loss=0.09122, over 3895784.80 frames. ], batch size: 77, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:02:29,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2690520.0, ans=0.5 2024-08-14 14:02:34,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2690520.0, ans=0.125 2024-08-14 14:02:40,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.302e+01 2.493e+01 2.763e+01 4.005e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-14 14:02:49,691 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 14:03:17,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2690820.0, ans=0.125 2024-08-14 14:03:35,082 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 14:03:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2691020.0, ans=0.125 2024-08-14 14:03:42,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8250, loss[loss=0.1132, beats_loss=0.01123, ecapa_loss=0.0001264, whisper_loss=0.1007, over 20898.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01061, ecapa_loss=0.0001565, whisper_loss=0.09154, over 3905628.33 frames. ], batch size: 80, lr: 3.28e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 14:03:50,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-14 14:03:51,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2691020.0, ans=0.1 2024-08-14 14:03:55,365 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 14:03:59,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2691120.0, ans=0.125 2024-08-14 14:04:00,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2691120.0, ans=0.0 2024-08-14 14:04:09,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2691120.0, ans=0.0 2024-08-14 14:04:12,479 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 14:04:33,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:37,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=8.0 2024-08-14 14:04:38,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:38,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2691320.0, ans=0.125 2024-08-14 14:04:40,847 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 33 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 14:04:46,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2024-08-14 14:04:55,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2024-08-14 14:04:56,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8300, loss[loss=0.05144, beats_loss=0.01289, ecapa_loss=0.0001586, whisper_loss=0.03696, over 12838.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001545, whisper_loss=0.09098, over 3896466.95 frames. ], batch size: 54, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:05:07,007 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 14:05:08,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.406e+01 2.618e+01 2.998e+01 6.409e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-14 14:05:08,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2691520.0, ans=0.2 2024-08-14 14:05:22,914 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 14:05:30,771 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 14:05:34,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2691720.0, ans=0.0 2024-08-14 14:05:39,430 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-14 14:05:54,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2024-08-14 14:06:10,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8350, loss[loss=0.1047, beats_loss=0.009889, ecapa_loss=0.0001716, whisper_loss=0.09312, over 22321.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001552, whisper_loss=0.0908, over 3897466.00 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:06:14,486 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 14:06:15,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-14 14:06:15,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2692020.0, ans=6.0 2024-08-14 14:06:25,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2024-08-14 14:06:38,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2692120.0, ans=0.125 2024-08-14 14:06:54,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2692220.0, ans=0.0 2024-08-14 14:06:57,706 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 14:07:30,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8400, loss[loss=0.08723, beats_loss=0.01002, ecapa_loss=0.0002013, whisper_loss=0.0752, over 18849.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09078, over 3891061.16 frames. ], batch size: 81, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:07:34,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2692520.0, ans=0.2 2024-08-14 14:07:38,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2692520.0, ans=0.0 2024-08-14 14:07:43,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.388e+01 2.632e+01 2.972e+01 1.432e+02, threshold=5.263e+01, percent-clipped=3.0 2024-08-14 14:07:54,732 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-14 14:07:55,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-08-14 14:07:59,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-08-14 14:08:08,309 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 14:08:14,464 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 14:08:21,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2692820.0, ans=0.0 2024-08-14 14:08:31,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2692820.0, ans=0.125 2024-08-14 14:08:35,594 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 14:08:47,754 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 14:08:48,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8450, loss[loss=0.1112, beats_loss=0.009052, ecapa_loss=0.000161, whisper_loss=0.1006, over 16099.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001577, whisper_loss=0.09137, over 3898860.32 frames. ], batch size: 64, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:08:53,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2024-08-14 14:09:03,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=12.0 2024-08-14 14:09:06,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2693120.0, ans=0.1 2024-08-14 14:09:21,894 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-14 14:09:26,790 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-14 14:09:37,574 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 14:09:40,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2693320.0, ans=0.0 2024-08-14 14:09:50,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2693420.0, ans=0.125 2024-08-14 14:09:52,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-08-14 14:09:55,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2693420.0, ans=0.125 2024-08-14 14:10:01,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=12.0 2024-08-14 14:10:06,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8500, loss[loss=0.09901, beats_loss=0.01053, ecapa_loss=0.0001616, whisper_loss=0.08687, over 21320.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.0001568, whisper_loss=0.09206, over 3904354.86 frames. ], batch size: 85, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:10:19,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.292e+01 2.601e+01 3.025e+01 1.070e+02, threshold=5.203e+01, percent-clipped=1.0 2024-08-14 14:10:26,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2693620.0, ans=0.035 2024-08-14 14:11:03,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2693820.0, ans=0.1 2024-08-14 14:11:05,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-14 14:11:19,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-08-14 14:11:26,242 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 14:11:27,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8550, loss[loss=0.1147, beats_loss=0.01153, ecapa_loss=0.0001494, whisper_loss=0.1017, over 21807.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0105, ecapa_loss=0.0001562, whisper_loss=0.0924, over 3917199.34 frames. ], batch size: 87, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:11:38,301 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 14:11:38,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2694020.0, ans=10.0 2024-08-14 14:11:46,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2694120.0, ans=0.025 2024-08-14 14:12:01,088 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 14:12:02,868 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 14:12:06,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2694220.0, ans=0.0 2024-08-14 14:12:07,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2694220.0, ans=0.125 2024-08-14 14:12:10,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2694220.0, ans=0.125 2024-08-14 14:12:14,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2694320.0, ans=0.1 2024-08-14 14:12:23,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2694320.0, ans=0.125 2024-08-14 14:12:30,661 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 14:12:45,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8600, loss[loss=0.121, beats_loss=0.009583, ecapa_loss=0.0001879, whisper_loss=0.1096, over 22049.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01048, ecapa_loss=0.0001563, whisper_loss=0.0924, over 3882752.44 frames. ], batch size: 88, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:12:48,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2694520.0, ans=0.125 2024-08-14 14:12:57,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.473e+01 2.757e+01 3.150e+01 4.170e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-14 14:13:01,125 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 14:13:03,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2694620.0, ans=0.0 2024-08-14 14:13:04,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2694620.0, ans=0.0 2024-08-14 14:13:09,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-14 14:13:12,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2694620.0, ans=0.125 2024-08-14 14:13:21,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-14 14:13:33,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2694820.0, ans=0.0 2024-08-14 14:13:43,044 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 14:13:47,445 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 14:13:54,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2694920.0, ans=0.125 2024-08-14 14:14:03,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8650, loss[loss=0.1157, beats_loss=0.01153, ecapa_loss=0.0001563, whisper_loss=0.1026, over 21799.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001557, whisper_loss=0.0917, over 3880556.97 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:14:07,215 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:14:37,335 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-14 14:14:39,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2695220.0, ans=0.125 2024-08-14 14:15:17,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2695520.0, ans=0.125 2024-08-14 14:15:18,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8700, loss[loss=0.09687, beats_loss=0.01145, ecapa_loss=0.000149, whisper_loss=0.08392, over 18821.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001556, whisper_loss=0.09187, over 3896002.94 frames. ], batch size: 71, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:15:30,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.361e+01 2.667e+01 2.943e+01 6.389e+01, threshold=5.334e+01, percent-clipped=1.0 2024-08-14 14:15:30,698 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 14:15:32,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2695620.0, ans=0.2 2024-08-14 14:15:33,582 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-14 14:15:53,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2695720.0, ans=0.09899494936611666 2024-08-14 14:16:31,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2696020.0, ans=0.125 2024-08-14 14:16:31,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8750, loss[loss=0.1105, beats_loss=0.01009, ecapa_loss=0.0001598, whisper_loss=0.09885, over 17422.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01048, ecapa_loss=0.0001565, whisper_loss=0.09172, over 3852348.06 frames. ], batch size: 75, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:16:47,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2696120.0, ans=0.95 2024-08-14 14:16:50,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2696120.0, ans=0.0 2024-08-14 14:16:57,076 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 14:16:59,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-14 14:17:00,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2696220.0, ans=0.125 2024-08-14 14:17:30,023 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 14:17:31,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2696420.0, ans=0.125 2024-08-14 14:17:38,625 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-14 14:17:41,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2696420.0, ans=0.125 2024-08-14 14:17:44,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8800, loss[loss=0.08237, beats_loss=0.01544, ecapa_loss=0.0001242, whisper_loss=0.06569, over 22066.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001552, whisper_loss=0.09139, over 3926477.35 frames. ], batch size: 90, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:17:46,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2696520.0, ans=0.025 2024-08-14 14:17:55,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.470e+01 2.757e+01 3.014e+01 7.462e+01, threshold=5.513e+01, percent-clipped=1.0 2024-08-14 14:18:26,794 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 14:18:54,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2696920.0, ans=0.125 2024-08-14 14:18:58,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8850, loss[loss=0.1132, beats_loss=0.01003, ecapa_loss=0.0001704, whisper_loss=0.1015, over 16343.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001555, whisper_loss=0.09074, over 3896416.82 frames. ], batch size: 68, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:19:36,582 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 14:19:36,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2697220.0, ans=0.125 2024-08-14 14:19:47,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2697320.0, ans=0.2 2024-08-14 14:19:48,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-14 14:19:58,854 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 14:20:01,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2697420.0, ans=0.0 2024-08-14 14:20:11,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8900, loss[loss=0.1067, beats_loss=0.01227, ecapa_loss=0.0001282, whisper_loss=0.09315, over 21561.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001546, whisper_loss=0.09091, over 3883776.93 frames. ], batch size: 85, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:20:23,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.627e+01 2.296e+01 2.497e+01 2.712e+01 4.460e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-14 14:20:28,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-08-14 14:20:30,717 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 14:20:52,869 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 14:21:07,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2697820.0, ans=0.95 2024-08-14 14:21:08,173 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-14 14:21:21,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2697920.0, ans=0.125 2024-08-14 14:21:25,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 8950, loss[loss=0.1256, beats_loss=0.009339, ecapa_loss=0.0001598, whisper_loss=0.1147, over 22982.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01077, ecapa_loss=0.0001539, whisper_loss=0.09111, over 3891919.30 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:21:26,067 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 14:21:47,655 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 14:21:49,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2698120.0, ans=0.125 2024-08-14 14:22:07,092 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 14:22:20,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2698320.0, ans=0.1 2024-08-14 14:22:23,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2698420.0, ans=0.0 2024-08-14 14:22:26,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2698420.0, ans=0.0 2024-08-14 14:22:36,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2698420.0, ans=0.125 2024-08-14 14:22:39,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9000, loss[loss=0.1149, beats_loss=0.009974, ecapa_loss=0.0001785, whisper_loss=0.1031, over 21921.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001551, whisper_loss=0.09095, over 3889678.88 frames. ], batch size: 89, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:22:39,091 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 14:23:17,725 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005393, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 14:23:35,669 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on SV_voxceleb1: loss=0.00426, beats_loss=0, ecapa_loss=0.000426, whisper_loss=0, over 939242.00 frames. 2024-08-14 14:25:24,122 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on AT_audioset: loss=0.02357, beats_loss=0.02357, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 14:25:24,126 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 14:25:27,394 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 14:25:35,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.561e+01 2.926e+01 5.640e+01, threshold=5.122e+01, percent-clipped=1.0 2024-08-14 14:25:37,790 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 14:25:38,924 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 14:25:44,990 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 14:25:49,223 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 14:25:54,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2698720.0, ans=0.125 2024-08-14 14:26:00,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.50 vs. limit=22.5 2024-08-14 14:26:03,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2698720.0, ans=0.125 2024-08-14 14:26:05,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-08-14 14:26:12,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2698820.0, ans=0.125 2024-08-14 14:26:15,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2698820.0, ans=0.125 2024-08-14 14:26:21,461 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 14:26:34,478 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 14:26:38,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9050, loss[loss=0.1053, beats_loss=0.009349, ecapa_loss=0.000168, whisper_loss=0.09427, over 22421.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001561, whisper_loss=0.09081, over 3870338.71 frames. ], batch size: 91, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:26:54,160 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 14:27:23,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2699320.0, ans=0.125 2024-08-14 14:27:27,832 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 14:27:47,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-14 14:27:48,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2699420.0, ans=0.0 2024-08-14 14:27:51,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2699520.0, ans=0.2 2024-08-14 14:27:52,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9100, loss[loss=0.08681, beats_loss=0.01041, ecapa_loss=0.0001639, whisper_loss=0.07476, over 16375.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001564, whisper_loss=0.09118, over 3900967.00 frames. ], batch size: 69, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:28:04,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.245e+01 2.534e+01 2.882e+01 3.902e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-14 14:28:11,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2699620.0, ans=0.125 2024-08-14 14:28:19,560 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 14:28:19,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2699620.0, ans=0.125 2024-08-14 14:28:47,660 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 14:29:02,371 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 14:29:06,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-08-14 14:29:06,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9150, loss[loss=0.1079, beats_loss=0.01142, ecapa_loss=0.0001446, whisper_loss=0.095, over 22824.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001568, whisper_loss=0.09109, over 3918245.29 frames. ], batch size: 92, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:29:10,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2700020.0, ans=0.2 2024-08-14 14:29:26,786 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 14:29:35,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2700220.0, ans=0.125 2024-08-14 14:29:36,262 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 14:29:43,439 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 14:30:05,904 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 14:30:09,950 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 14:30:19,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9200, loss[loss=0.09182, beats_loss=0.01329, ecapa_loss=0.0001266, whisper_loss=0.07727, over 20192.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001558, whisper_loss=0.09132, over 3930736.73 frames. ], batch size: 82, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:30:23,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2700520.0, ans=0.2 2024-08-14 14:30:27,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2700520.0, ans=0.125 2024-08-14 14:30:30,024 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 14:30:31,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.280e+01 2.601e+01 2.975e+01 5.180e+01, threshold=5.201e+01, percent-clipped=1.0 2024-08-14 14:30:43,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2700620.0, ans=0.125 2024-08-14 14:30:49,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2700720.0, ans=0.0 2024-08-14 14:30:56,220 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 14:31:25,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2700920.0, ans=0.0 2024-08-14 14:31:28,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2700920.0, ans=0.125 2024-08-14 14:31:28,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2700920.0, ans=0.125 2024-08-14 14:31:31,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2701020.0, ans=0.0 2024-08-14 14:31:31,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9250, loss[loss=0.08386, beats_loss=0.01236, ecapa_loss=0.0001767, whisper_loss=0.06973, over 15991.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001561, whisper_loss=0.09124, over 3906094.60 frames. ], batch size: 72, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:32:07,520 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 14:32:08,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2701220.0, ans=0.0 2024-08-14 14:32:13,542 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 14:32:20,899 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 14:32:31,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2701420.0, ans=0.0 2024-08-14 14:32:40,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2701420.0, ans=0.125 2024-08-14 14:32:41,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2701420.0, ans=0.0 2024-08-14 14:32:43,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9300, loss[loss=0.1225, beats_loss=0.009826, ecapa_loss=0.000177, whisper_loss=0.1109, over 19895.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01073, ecapa_loss=0.0001559, whisper_loss=0.09118, over 3899659.28 frames. ], batch size: 82, lr: 3.28e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:32:56,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.362e+01 2.551e+01 2.899e+01 4.764e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-14 14:32:58,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2701620.0, ans=0.0 2024-08-14 14:33:06,065 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 14:33:10,658 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 14:33:19,313 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 14:33:24,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2701720.0, ans=0.0 2024-08-14 14:33:41,649 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 14:33:50,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2701920.0, ans=0.1 2024-08-14 14:33:52,113 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 14:33:53,415 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-14 14:33:57,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9350, loss[loss=0.09487, beats_loss=0.01102, ecapa_loss=0.0001776, whisper_loss=0.08207, over 17344.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01058, ecapa_loss=0.0001572, whisper_loss=0.0918, over 3859359.51 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:33:57,973 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 14:34:03,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-14 14:34:05,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2702020.0, ans=0.125 2024-08-14 14:34:11,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2702120.0, ans=0.125 2024-08-14 14:34:30,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-14 14:34:47,354 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 14:34:57,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-14 14:34:58,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2702420.0, ans=0.5 2024-08-14 14:35:11,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9400, loss[loss=0.09245, beats_loss=0.01466, ecapa_loss=0.0001207, whisper_loss=0.07659, over 22016.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001554, whisper_loss=0.09092, over 3872072.90 frames. ], batch size: 89, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:35:12,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2702520.0, ans=0.2 2024-08-14 14:35:13,544 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 14:35:19,494 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 14:35:22,466 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 14:35:23,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.407e+01 2.622e+01 2.905e+01 1.999e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-14 14:35:40,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0 2024-08-14 14:35:56,013 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 14:35:56,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2702820.0, ans=0.125 2024-08-14 14:36:09,325 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 14:36:11,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-14 14:36:17,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2702920.0, ans=0.125 2024-08-14 14:36:23,989 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 14:36:25,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9450, loss[loss=0.09704, beats_loss=0.01244, ecapa_loss=0.0001402, whisper_loss=0.08319, over 18360.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01076, ecapa_loss=0.0001555, whisper_loss=0.09077, over 3866187.57 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:36:59,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2703220.0, ans=0.0 2024-08-14 14:37:05,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2703220.0, ans=0.0 2024-08-14 14:37:21,258 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-14 14:37:28,751 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 28 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-14 14:37:37,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9500, loss[loss=0.09481, beats_loss=0.00927, ecapa_loss=0.0001973, whisper_loss=0.08356, over 21086.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001553, whisper_loss=0.09073, over 3867269.74 frames. ], batch size: 88, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:37:48,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.397e+01 2.649e+01 2.966e+01 9.786e+01, threshold=5.299e+01, percent-clipped=1.0 2024-08-14 14:37:54,832 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 14:37:58,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-14 14:38:12,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2703720.0, ans=0.125 2024-08-14 14:38:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2703720.0, ans=0.1 2024-08-14 14:38:20,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2703820.0, ans=0.1 2024-08-14 14:38:26,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-14 14:38:31,890 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 14:38:32,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2703820.0, ans=0.125 2024-08-14 14:38:50,517 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9550, loss[loss=0.1185, beats_loss=0.009383, ecapa_loss=0.0001383, whisper_loss=0.1077, over 24190.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001564, whisper_loss=0.0912, over 3881063.82 frames. ], batch size: 94, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:38:52,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2704020.0, ans=0.125 2024-08-14 14:39:05,179 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-14 14:39:11,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-08-14 14:39:12,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2704120.0, ans=0.1 2024-08-14 14:39:15,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2704120.0, ans=0.125 2024-08-14 14:39:16,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-08-14 14:39:31,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2704220.0, ans=0.2 2024-08-14 14:39:33,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-14 14:39:43,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2704320.0, ans=0.2 2024-08-14 14:39:44,854 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 14:39:47,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2704320.0, ans=0.0 2024-08-14 14:39:51,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2704320.0, ans=0.2 2024-08-14 14:39:52,856 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 14:39:55,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2024-08-14 14:40:02,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2704420.0, ans=0.125 2024-08-14 14:40:12,014 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 14:40:17,098 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9600, loss[loss=0.1061, beats_loss=0.01135, ecapa_loss=0.000137, whisper_loss=0.09333, over 19365.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001553, whisper_loss=0.091, over 3870784.99 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:40:18,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2704520.0, ans=0.125 2024-08-14 14:40:22,892 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 14:40:31,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.443e+01 2.792e+01 3.086e+01 6.637e+01, threshold=5.584e+01, percent-clipped=2.0 2024-08-14 14:40:34,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2704620.0, ans=0.07 2024-08-14 14:40:35,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2704620.0, ans=0.0 2024-08-14 14:40:46,236 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 14:40:49,095 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-14 14:40:52,606 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 14:41:08,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2704720.0, ans=0.0 2024-08-14 14:41:23,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2704820.0, ans=0.125 2024-08-14 14:41:25,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2024-08-14 14:41:36,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-14 14:41:48,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9650, loss[loss=0.1041, beats_loss=0.01066, ecapa_loss=0.0001628, whisper_loss=0.09179, over 19229.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09031, over 3843433.85 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:41:54,281 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 14:42:02,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.41 vs. limit=10.0 2024-08-14 14:42:04,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2705020.0, ans=0.0 2024-08-14 14:42:28,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2705220.0, ans=0.125 2024-08-14 14:42:37,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2705320.0, ans=0.0 2024-08-14 14:42:59,358 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 14:43:02,080 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 14:43:05,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9700, loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0001635, whisper_loss=0.09512, over 18747.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.09042, over 3816074.55 frames. ], batch size: 77, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:43:17,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.210e+01 2.464e+01 2.850e+01 7.455e+01, threshold=4.928e+01, percent-clipped=1.0 2024-08-14 14:43:28,647 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 14:43:38,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2705720.0, ans=0.025 2024-08-14 14:43:42,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2705720.0, ans=0.0 2024-08-14 14:43:57,120 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 14:43:57,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2705820.0, ans=0.0 2024-08-14 14:43:57,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2705820.0, ans=0.1 2024-08-14 14:44:06,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2705920.0, ans=0.125 2024-08-14 14:44:10,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2705920.0, ans=0.1 2024-08-14 14:44:20,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9750, loss[loss=0.1124, beats_loss=0.009182, ecapa_loss=0.0001779, whisper_loss=0.1014, over 22542.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001573, whisper_loss=0.08997, over 3808116.67 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:44:33,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2706020.0, ans=0.125 2024-08-14 14:44:40,860 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 14:44:52,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2024-08-14 14:45:00,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2706220.0, ans=0.125 2024-08-14 14:45:12,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2706320.0, ans=0.125 2024-08-14 14:45:37,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9800, loss[loss=0.09061, beats_loss=0.01263, ecapa_loss=0.0001377, whisper_loss=0.0766, over 15371.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001574, whisper_loss=0.08929, over 3819009.41 frames. ], batch size: 62, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:45:37,414 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 14:45:43,879 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 14:45:49,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.322e+01 2.608e+01 2.964e+01 4.916e+01, threshold=5.216e+01, percent-clipped=0.0 2024-08-14 14:46:14,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2706720.0, ans=0.1 2024-08-14 14:46:19,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2706720.0, ans=0.125 2024-08-14 14:46:31,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2706820.0, ans=0.125 2024-08-14 14:46:34,187 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 14:46:46,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2706920.0, ans=0.1 2024-08-14 14:46:51,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9850, loss[loss=0.09445, beats_loss=0.01303, ecapa_loss=0.000185, whisper_loss=0.07957, over 17478.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001577, whisper_loss=0.08956, over 3835630.47 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:47:11,168 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.506e+05 2024-08-14 14:47:12,066 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 14:47:18,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2707120.0, ans=0.1 2024-08-14 14:47:20,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2707220.0, ans=0.1 2024-08-14 14:47:21,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2707220.0, ans=0.125 2024-08-14 14:47:27,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2707220.0, ans=0.0 2024-08-14 14:47:28,714 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 11 from Vox, 51 fro AS 2024-08-14 14:47:58,381 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 14:48:00,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2707420.0, ans=0.125 2024-08-14 14:48:07,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9900, loss[loss=0.07468, beats_loss=0.01208, ecapa_loss=0.0001162, whisper_loss=0.06143, over 14655.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.0001564, whisper_loss=0.09022, over 3900014.87 frames. ], batch size: 56, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:48:15,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2707520.0, ans=0.0 2024-08-14 14:48:15,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2707520.0, ans=0.125 2024-08-14 14:48:19,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.359e+01 2.713e+01 2.970e+01 4.614e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-14 14:48:24,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=15.0 2024-08-14 14:48:25,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2707620.0, ans=0.125 2024-08-14 14:48:47,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2707720.0, ans=15.0 2024-08-14 14:48:48,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=8.0 2024-08-14 14:49:00,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2024-08-14 14:49:16,596 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 14:49:32,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2707920.0, ans=0.09899494936611666 2024-08-14 14:49:34,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2707920.0, ans=0.125 2024-08-14 14:49:38,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 9950, loss[loss=0.1128, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.1009, over 23227.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0108, ecapa_loss=0.0001559, whisper_loss=0.09009, over 3896712.23 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:49:45,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2708020.0, ans=0.125 2024-08-14 14:49:52,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-08-14 14:50:05,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2708120.0, ans=0.1 2024-08-14 14:50:24,568 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 14:50:27,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2708220.0, ans=0.0 2024-08-14 14:50:45,137 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 14:50:50,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2708320.0, ans=0.125 2024-08-14 14:51:15,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2708420.0, ans=0.125 2024-08-14 14:51:19,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2708420.0, ans=0.125 2024-08-14 14:51:27,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10000, loss[loss=0.1053, beats_loss=0.009244, ecapa_loss=0.0001367, whisper_loss=0.09464, over 18521.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001562, whisper_loss=0.09038, over 3888869.85 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:51:29,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2708520.0, ans=0.125 2024-08-14 14:51:46,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.366e+01 2.562e+01 2.817e+01 3.470e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 14:51:49,539 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 14:52:00,084 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 14:52:12,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2708720.0, ans=0.0 2024-08-14 14:52:18,205 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 14:52:18,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-14 14:52:38,907 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 14:52:58,560 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10050, loss[loss=0.1227, beats_loss=0.009871, ecapa_loss=0.0001759, whisper_loss=0.1111, over 20805.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.09038, over 3913567.15 frames. ], batch size: 83, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:53:06,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2709020.0, ans=0.125 2024-08-14 14:53:22,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2709120.0, ans=0.125 2024-08-14 14:53:58,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2709320.0, ans=0.125 2024-08-14 14:54:16,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10100, loss[loss=0.1078, beats_loss=0.01031, ecapa_loss=0.0001689, whisper_loss=0.09582, over 22974.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001556, whisper_loss=0.09048, over 3925267.62 frames. ], batch size: 93, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:54:23,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2709520.0, ans=0.125 2024-08-14 14:54:29,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.269e+01 2.495e+01 2.791e+01 4.696e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-14 14:54:32,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2709620.0, ans=0.125 2024-08-14 14:54:50,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=15.0 2024-08-14 14:55:08,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2709820.0, ans=0.0 2024-08-14 14:55:08,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2709820.0, ans=0.125 2024-08-14 14:55:20,680 WARNING [optim.py:496] (3/4) Scaling gradients by 0.040875811129808426, model_norm_threshold=49.8900260925293 2024-08-14 14:55:20,842 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.113e+05, grad_sumsq=3.113e+05, orig_rms_sq=1.000e+00 2024-08-14 14:55:22,542 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-14 14:55:33,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2710020.0, ans=0.125 2024-08-14 14:55:34,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10150, loss[loss=0.1226, beats_loss=0.008499, ecapa_loss=0.0001728, whisper_loss=0.1123, over 23256.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001555, whisper_loss=0.09067, over 3925567.68 frames. ], batch size: 91, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:55:40,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2710020.0, ans=0.0 2024-08-14 14:55:45,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2024-08-14 14:55:47,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2710020.0, ans=0.2 2024-08-14 14:56:04,661 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 14:56:11,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-14 14:56:20,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2710320.0, ans=0.0 2024-08-14 14:56:35,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2710420.0, ans=0.125 2024-08-14 14:56:41,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=2710420.0, ans=12.0 2024-08-14 14:56:49,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2024-08-14 14:56:49,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.46 vs. limit=15.0 2024-08-14 14:56:51,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10200, loss[loss=0.1073, beats_loss=0.01031, ecapa_loss=0.0001642, whisper_loss=0.09534, over 20217.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001558, whisper_loss=0.09118, over 3934350.26 frames. ], batch size: 81, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:57:04,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.342e+01 2.619e+01 2.972e+01 1.221e+03, threshold=5.239e+01, percent-clipped=2.0 2024-08-14 14:57:17,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2710620.0, ans=0.1 2024-08-14 14:57:25,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2710720.0, ans=0.125 2024-08-14 14:57:38,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-14 14:57:49,468 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 14:57:51,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2710920.0, ans=0.125 2024-08-14 14:57:58,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-14 14:58:05,646 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 14:58:08,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10250, loss[loss=0.1021, beats_loss=0.009802, ecapa_loss=0.0001652, whisper_loss=0.09061, over 18773.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.09123, over 3940784.04 frames. ], batch size: 72, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 14:58:16,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2711020.0, ans=0.0 2024-08-14 14:58:45,011 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 14:58:57,656 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 14:58:58,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2711320.0, ans=0.0 2024-08-14 14:59:00,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-14 14:59:10,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2711420.0, ans=0.125 2024-08-14 14:59:17,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2711420.0, ans=0.125 2024-08-14 14:59:19,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2711420.0, ans=0.0 2024-08-14 14:59:24,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2711420.0, ans=0.0 2024-08-14 14:59:25,221 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 14:59:29,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10300, loss[loss=0.09997, beats_loss=0.009749, ecapa_loss=0.0001488, whisper_loss=0.08873, over 22480.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001551, whisper_loss=0.09043, over 3946695.08 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 1.152921504606847e+18 2024-08-14 14:59:35,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2711520.0, ans=0.125 2024-08-14 14:59:37,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2711520.0, ans=10.0 2024-08-14 14:59:38,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2711520.0, ans=0.125 2024-08-14 14:59:41,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.309e+01 2.627e+01 3.015e+01 4.712e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-14 14:59:59,798 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 15:00:02,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2711720.0, ans=0.2 2024-08-14 15:00:07,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-14 15:00:21,864 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.466e-02 2024-08-14 15:00:38,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2024-08-14 15:00:42,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2711920.0, ans=0.025 2024-08-14 15:00:43,334 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 15:00:54,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10350, loss[loss=0.089, beats_loss=0.01186, ecapa_loss=0.0001601, whisper_loss=0.07555, over 20881.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001563, whisper_loss=0.09039, over 3959207.60 frames. ], batch size: 87, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:00:59,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-14 15:01:29,932 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 15:01:57,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-14 15:01:57,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-14 15:02:12,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10400, loss[loss=0.07716, beats_loss=0.01319, ecapa_loss=0.0001175, whisper_loss=0.0628, over 13874.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001555, whisper_loss=0.09017, over 3894154.27 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:02:25,055 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.801e+01 2024-08-14 15:02:25,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.638e+01 3.125e+01 4.616e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-14 15:02:34,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2712620.0, ans=0.1 2024-08-14 15:02:51,959 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 15:03:11,388 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 15:03:12,819 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-14 15:03:15,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2712920.0, ans=0.125 2024-08-14 15:03:22,828 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 15:03:23,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2712920.0, ans=0.07 2024-08-14 15:03:24,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2712920.0, ans=0.2 2024-08-14 15:03:26,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10450, loss[loss=0.1105, beats_loss=0.009617, ecapa_loss=0.0001159, whisper_loss=0.09969, over 20620.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.09041, over 3872671.85 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:03:29,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2713020.0, ans=0.0 2024-08-14 15:03:39,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2713020.0, ans=0.0 2024-08-14 15:03:39,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2713020.0, ans=0.125 2024-08-14 15:03:43,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-14 15:04:15,538 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 15:04:17,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2713320.0, ans=0.2 2024-08-14 15:04:18,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2713320.0, ans=0.025 2024-08-14 15:04:30,349 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-14 15:04:42,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10500, loss[loss=0.09101, beats_loss=0.01246, ecapa_loss=0.0001569, whisper_loss=0.07698, over 21314.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001554, whisper_loss=0.09076, over 3871915.99 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:04:51,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-08-14 15:04:55,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.389e+01 2.560e+01 2.877e+01 3.688e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-14 15:04:57,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2713620.0, ans=0.125 2024-08-14 15:04:57,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2713620.0, ans=0.125 2024-08-14 15:05:13,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2713720.0, ans=15.0 2024-08-14 15:05:19,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2713720.0, ans=0.1 2024-08-14 15:05:21,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=2713720.0, ans=0.02 2024-08-14 15:05:42,626 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 15:05:42,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2713920.0, ans=0.0 2024-08-14 15:05:48,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2713920.0, ans=0.0 2024-08-14 15:05:56,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10550, loss[loss=0.08416, beats_loss=0.01219, ecapa_loss=0.0001235, whisper_loss=0.07073, over 13951.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001559, whisper_loss=0.09082, over 3843930.31 frames. ], batch size: 54, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:06:13,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2714120.0, ans=0.0 2024-08-14 15:06:25,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2714220.0, ans=0.2 2024-08-14 15:06:27,913 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 15:06:37,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-14 15:06:47,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2714320.0, ans=0.125 2024-08-14 15:06:51,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-14 15:06:53,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2714320.0, ans=0.2 2024-08-14 15:07:10,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10600, loss[loss=0.1092, beats_loss=0.009986, ecapa_loss=0.0001457, whisper_loss=0.09776, over 19661.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001556, whisper_loss=0.09009, over 3837274.88 frames. ], batch size: 77, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:07:17,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2714520.0, ans=0.125 2024-08-14 15:07:23,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2714520.0, ans=0.1 2024-08-14 15:07:24,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.333e+01 2.524e+01 2.900e+01 4.921e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-14 15:07:31,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2714620.0, ans=0.125 2024-08-14 15:07:42,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2714720.0, ans=0.125 2024-08-14 15:07:54,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2714820.0, ans=0.125 2024-08-14 15:08:01,716 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 15:08:11,916 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-14 15:08:23,990 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 15:08:25,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10650, loss[loss=0.09525, beats_loss=0.01307, ecapa_loss=0.0001499, whisper_loss=0.08068, over 17740.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001545, whisper_loss=0.08987, over 3870356.83 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:08:25,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2715020.0, ans=0.2 2024-08-14 15:08:32,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2715020.0, ans=0.125 2024-08-14 15:08:38,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2715020.0, ans=0.1 2024-08-14 15:08:39,178 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 15:08:45,142 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 15:08:56,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2715220.0, ans=0.035 2024-08-14 15:09:01,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2715220.0, ans=0.125 2024-08-14 15:09:03,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-14 15:09:06,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2715220.0, ans=0.1 2024-08-14 15:09:26,684 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 15:09:34,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2715420.0, ans=0.125 2024-08-14 15:09:39,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10700, loss[loss=0.125, beats_loss=0.008662, ecapa_loss=0.0001492, whisper_loss=0.1148, over 22498.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001529, whisper_loss=0.09005, over 3856407.56 frames. ], batch size: 90, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:09:53,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.367e+01 2.619e+01 3.037e+01 4.020e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-14 15:10:19,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2715720.0, ans=0.0 2024-08-14 15:10:35,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-14 15:10:53,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2716020.0, ans=0.1 2024-08-14 15:10:53,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2716020.0, ans=0.0 2024-08-14 15:10:54,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10750, loss[loss=0.1086, beats_loss=0.01107, ecapa_loss=0.0001457, whisper_loss=0.09611, over 17156.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001539, whisper_loss=0.09074, over 3855430.85 frames. ], batch size: 68, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:11:15,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2716120.0, ans=0.125 2024-08-14 15:11:21,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2716120.0, ans=0.2 2024-08-14 15:11:22,116 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 15:11:29,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2716220.0, ans=0.125 2024-08-14 15:11:32,884 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:11:34,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2716220.0, ans=0.125 2024-08-14 15:11:54,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2716420.0, ans=0.0 2024-08-14 15:12:01,585 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-14 15:12:02,949 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 15:12:05,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=15.0 2024-08-14 15:12:09,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10800, loss[loss=0.09327, beats_loss=0.01359, ecapa_loss=0.0001418, whisper_loss=0.07826, over 20134.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01075, ecapa_loss=0.0001549, whisper_loss=0.09154, over 3891325.75 frames. ], batch size: 84, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:12:10,132 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 15:12:14,888 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 15:12:23,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.404e+01 2.650e+01 3.101e+01 5.207e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-14 15:12:27,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2716620.0, ans=0.125 2024-08-14 15:12:54,249 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 15:13:00,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2716820.0, ans=0.0 2024-08-14 15:13:11,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2716920.0, ans=0.025 2024-08-14 15:13:23,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10850, loss[loss=0.09173, beats_loss=0.01104, ecapa_loss=0.0001467, whisper_loss=0.07923, over 17412.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01069, ecapa_loss=0.0001556, whisper_loss=0.09248, over 3919566.22 frames. ], batch size: 69, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:13:28,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2717020.0, ans=0.0 2024-08-14 15:13:36,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2717020.0, ans=0.2 2024-08-14 15:13:41,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2717120.0, ans=0.0 2024-08-14 15:13:47,049 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 15:14:09,935 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 15:14:13,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2717320.0, ans=0.0 2024-08-14 15:14:13,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-14 15:14:23,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2717420.0, ans=0.0 2024-08-14 15:14:29,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2717420.0, ans=0.125 2024-08-14 15:14:34,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2024-08-14 15:14:39,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10900, loss[loss=0.1201, beats_loss=0.008847, ecapa_loss=0.0001622, whisper_loss=0.1096, over 13643.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001551, whisper_loss=0.09245, over 3908239.47 frames. ], batch size: 53, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:14:45,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2717520.0, ans=0.04949747468305833 2024-08-14 15:14:52,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.325e+01 2.589e+01 2.879e+01 4.786e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-14 15:15:19,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2717720.0, ans=0.0 2024-08-14 15:15:28,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2717820.0, ans=0.2 2024-08-14 15:15:36,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2717820.0, ans=0.0 2024-08-14 15:15:53,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 10950, loss[loss=0.1049, beats_loss=0.0103, ecapa_loss=0.0001683, whisper_loss=0.09296, over 19127.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01067, ecapa_loss=0.0001557, whisper_loss=0.09238, over 3936852.88 frames. ], batch size: 75, lr: 3.27e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:16:05,804 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 15:16:07,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-14 15:16:09,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2718120.0, ans=0.07 2024-08-14 15:16:43,068 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 15:16:57,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2718420.0, ans=0.125 2024-08-14 15:17:10,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11000, loss[loss=0.1056, beats_loss=0.01117, ecapa_loss=0.0001484, whisper_loss=0.09291, over 20006.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001549, whisper_loss=0.09209, over 3927709.31 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:17:22,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=15.0 2024-08-14 15:17:25,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.292e+01 2.575e+01 2.886e+01 4.359e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 15:17:31,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2718620.0, ans=0.1 2024-08-14 15:17:47,960 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 15:18:04,302 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 15:18:19,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2718920.0, ans=0.1 2024-08-14 15:18:27,866 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 15:18:30,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-14 15:18:34,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11050, loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001816, whisper_loss=0.08944, over 19785.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09233, over 3934729.42 frames. ], batch size: 82, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:18:34,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2719020.0, ans=0.1 2024-08-14 15:18:41,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2719020.0, ans=0.125 2024-08-14 15:18:44,513 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 15:18:56,801 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 15:19:06,492 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-14 15:19:16,191 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-14 15:19:46,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2719420.0, ans=0.0 2024-08-14 15:20:00,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11100, loss[loss=0.08583, beats_loss=0.01101, ecapa_loss=0.0001857, whisper_loss=0.07296, over 15790.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001565, whisper_loss=0.09177, over 3897652.07 frames. ], batch size: 66, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:20:14,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.445e+01 2.651e+01 2.947e+01 5.465e+01, threshold=5.303e+01, percent-clipped=1.0 2024-08-14 15:20:19,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2719620.0, ans=0.125 2024-08-14 15:20:33,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2719720.0, ans=0.125 2024-08-14 15:20:48,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-14 15:20:58,262 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 15:21:17,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2719920.0, ans=0.0 2024-08-14 15:21:19,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11150, loss[loss=0.1341, beats_loss=0.006768, ecapa_loss=0.000165, whisper_loss=0.1257, over 19593.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001563, whisper_loss=0.09212, over 3917913.94 frames. ], batch size: 72, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:21:29,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2720020.0, ans=0.1 2024-08-14 15:22:12,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2720320.0, ans=0.125 2024-08-14 15:22:13,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2720320.0, ans=10.0 2024-08-14 15:22:14,538 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 15:22:21,823 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 37 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 15:22:28,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=12.0 2024-08-14 15:22:33,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11200, loss[loss=0.1192, beats_loss=0.01079, ecapa_loss=0.0001166, whisper_loss=0.1073, over 23125.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01062, ecapa_loss=0.0001568, whisper_loss=0.09187, over 3877485.32 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:22:38,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2720520.0, ans=0.0 2024-08-14 15:22:46,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.435e+01 2.587e+01 2.892e+01 4.591e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-14 15:23:03,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2720720.0, ans=0.125 2024-08-14 15:23:07,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2720720.0, ans=0.125 2024-08-14 15:23:12,895 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 15:23:21,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2720820.0, ans=0.0 2024-08-14 15:23:36,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2720920.0, ans=0.125 2024-08-14 15:23:47,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11250, loss[loss=0.1011, beats_loss=0.01243, ecapa_loss=0.0001373, whisper_loss=0.0873, over 15610.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.000156, whisper_loss=0.09145, over 3863154.16 frames. ], batch size: 62, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:23:47,685 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-14 15:24:42,797 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-14 15:24:52,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2721420.0, ans=0.125 2024-08-14 15:25:08,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11300, loss[loss=0.1058, beats_loss=0.008623, ecapa_loss=0.0002249, whisper_loss=0.09489, over 20041.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01075, ecapa_loss=0.0001548, whisper_loss=0.09019, over 3877667.47 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:25:12,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2721520.0, ans=0.125 2024-08-14 15:25:16,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-08-14 15:25:21,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.316e+01 2.542e+01 2.891e+01 3.051e+02, threshold=5.084e+01, percent-clipped=1.0 2024-08-14 15:25:23,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2721620.0, ans=0.125 2024-08-14 15:25:24,634 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 15:25:28,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-14 15:25:42,920 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 15:25:51,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2721820.0, ans=0.125 2024-08-14 15:26:22,356 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 15:26:24,216 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 15:26:25,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11350, loss[loss=0.1228, beats_loss=0.009773, ecapa_loss=0.0001521, whisper_loss=0.1115, over 23555.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01064, ecapa_loss=0.0001547, whisper_loss=0.09145, over 3875577.72 frames. ], batch size: 91, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:26:30,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2722020.0, ans=0.09899494936611666 2024-08-14 15:26:43,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2722120.0, ans=0.0 2024-08-14 15:27:02,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2722220.0, ans=0.125 2024-08-14 15:27:15,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2722220.0, ans=0.07 2024-08-14 15:27:52,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2722420.0, ans=0.125 2024-08-14 15:27:55,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-14 15:27:59,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11400, loss[loss=0.1226, beats_loss=0.0101, ecapa_loss=0.0001359, whisper_loss=0.1111, over 21666.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01071, ecapa_loss=0.0001539, whisper_loss=0.09157, over 3887621.09 frames. ], batch size: 84, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:28:09,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2722520.0, ans=0.125 2024-08-14 15:28:12,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2722520.0, ans=0.0 2024-08-14 15:28:13,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.371e+01 2.609e+01 2.947e+01 4.785e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-14 15:28:16,825 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 15:28:24,962 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 15:28:35,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2722720.0, ans=0.1 2024-08-14 15:29:00,489 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 15:29:07,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-14 15:29:19,664 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 15:29:31,502 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11450, loss[loss=0.1199, beats_loss=0.008154, ecapa_loss=0.0001822, whisper_loss=0.1099, over 19279.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.09194, over 3923084.35 frames. ], batch size: 75, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:29:33,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2723020.0, ans=0.125 2024-08-14 15:29:55,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=15.0 2024-08-14 15:30:09,099 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 15:30:09,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2723120.0, ans=0.125 2024-08-14 15:31:11,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2723420.0, ans=0.0 2024-08-14 15:31:29,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2723520.0, ans=0.125 2024-08-14 15:31:30,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11500, loss[loss=0.08476, beats_loss=0.009583, ecapa_loss=0.0001741, whisper_loss=0.07344, over 21159.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01065, ecapa_loss=0.0001544, whisper_loss=0.09218, over 3932770.59 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:31:52,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.370e+01 2.644e+01 2.916e+01 4.086e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-14 15:32:18,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2723720.0, ans=0.1 2024-08-14 15:32:20,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2723720.0, ans=0.0 2024-08-14 15:32:23,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2723720.0, ans=0.125 2024-08-14 15:32:27,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2723720.0, ans=0.0 2024-08-14 15:32:39,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2723720.0, ans=0.0 2024-08-14 15:33:23,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.579e-03 2024-08-14 15:33:30,255 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 15:33:31,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11550, loss[loss=0.0899, beats_loss=0.008781, ecapa_loss=0.0001849, whisper_loss=0.07927, over 14775.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001551, whisper_loss=0.09206, over 3927255.44 frames. ], batch size: 60, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:33:38,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2724020.0, ans=0.125 2024-08-14 15:34:14,993 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 15:34:50,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2724320.0, ans=0.125 2024-08-14 15:35:01,902 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 15:35:08,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2724420.0, ans=0.125 2024-08-14 15:35:11,125 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 15:35:16,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11600, loss[loss=0.1043, beats_loss=0.01217, ecapa_loss=0.0001172, whisper_loss=0.09094, over 18870.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001544, whisper_loss=0.09088, over 3913761.87 frames. ], batch size: 72, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:35:24,039 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.845e+00 2024-08-14 15:35:28,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-14 15:35:29,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.402e+01 2.609e+01 2.881e+01 4.573e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-14 15:35:51,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2724720.0, ans=0.125 2024-08-14 15:35:53,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2724720.0, ans=0.125 2024-08-14 15:35:57,100 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 15:36:02,516 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-14 15:36:15,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2724920.0, ans=0.125 2024-08-14 15:36:25,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2724920.0, ans=10.0 2024-08-14 15:36:28,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11650, loss[loss=0.1117, beats_loss=0.01135, ecapa_loss=0.0001374, whisper_loss=0.09898, over 22968.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0107, ecapa_loss=0.0001547, whisper_loss=0.09152, over 3947827.46 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:36:31,621 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 15:36:48,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2725120.0, ans=0.125 2024-08-14 15:36:49,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2725120.0, ans=0.2 2024-08-14 15:37:01,454 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 15:37:03,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2725220.0, ans=0.125 2024-08-14 15:37:06,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2725220.0, ans=0.125 2024-08-14 15:37:08,716 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 15:37:15,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2725320.0, ans=0.0 2024-08-14 15:37:16,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2725320.0, ans=0.125 2024-08-14 15:37:26,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=12.0 2024-08-14 15:37:27,143 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 15:37:34,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2725420.0, ans=0.125 2024-08-14 15:37:44,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11700, loss[loss=0.1238, beats_loss=0.007199, ecapa_loss=0.0001466, whisper_loss=0.1152, over 22118.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01069, ecapa_loss=0.000154, whisper_loss=0.09213, over 3932076.13 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:37:55,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2725520.0, ans=0.125 2024-08-14 15:37:57,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-14 15:37:59,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.338e+01 2.598e+01 2.950e+01 6.638e+01, threshold=5.196e+01, percent-clipped=2.0 2024-08-14 15:38:05,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2725620.0, ans=0.0 2024-08-14 15:38:17,255 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 15:38:18,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-08-14 15:38:33,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-14 15:38:36,809 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-14 15:38:38,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2024-08-14 15:38:41,182 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-14 15:38:43,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2024-08-14 15:38:57,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2725920.0, ans=0.05 2024-08-14 15:38:58,857 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-14 15:39:00,362 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 15:39:11,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11750, loss[loss=0.1038, beats_loss=0.009495, ecapa_loss=0.0001312, whisper_loss=0.09301, over 14996.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.09185, over 3920219.21 frames. ], batch size: 57, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:39:29,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2726120.0, ans=0.0 2024-08-14 15:39:59,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2726220.0, ans=0.2 2024-08-14 15:40:11,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2726320.0, ans=0.125 2024-08-14 15:40:12,802 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 15:40:19,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2726420.0, ans=0.125 2024-08-14 15:40:32,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11800, loss[loss=0.1046, beats_loss=0.01018, ecapa_loss=0.0001723, whisper_loss=0.0927, over 21627.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01072, ecapa_loss=0.0001547, whisper_loss=0.09287, over 3917811.15 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:40:45,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.511e+01 2.719e+01 3.108e+01 4.014e+02, threshold=5.439e+01, percent-clipped=2.0 2024-08-14 15:40:47,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2726620.0, ans=0.125 2024-08-14 15:40:59,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2726620.0, ans=0.125 2024-08-14 15:41:07,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2726720.0, ans=0.125 2024-08-14 15:41:10,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2726720.0, ans=0.125 2024-08-14 15:41:14,537 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 32 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-14 15:41:22,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2726820.0, ans=0.1 2024-08-14 15:41:26,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2726820.0, ans=0.0 2024-08-14 15:41:32,053 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 15:41:44,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11850, loss[loss=0.09695, beats_loss=0.009897, ecapa_loss=0.0001955, whisper_loss=0.0851, over 15724.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001548, whisper_loss=0.09262, over 3918897.94 frames. ], batch size: 65, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:41:56,023 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 15:42:00,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2727120.0, ans=0.125 2024-08-14 15:42:09,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2727120.0, ans=10.0 2024-08-14 15:42:23,786 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 15:42:24,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-14 15:42:34,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-14 15:42:40,720 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 10 from Vox, 44 fro AS 2024-08-14 15:42:41,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2727420.0, ans=0.2 2024-08-14 15:42:49,456 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 15:42:56,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11900, loss[loss=0.08467, beats_loss=0.01085, ecapa_loss=0.0001847, whisper_loss=0.07197, over 18342.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01071, ecapa_loss=0.0001552, whisper_loss=0.09257, over 3940912.41 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:43:04,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2727520.0, ans=0.1 2024-08-14 15:43:09,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.301e+01 2.664e+01 2.917e+01 5.181e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-14 15:43:13,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2727620.0, ans=0.0 2024-08-14 15:43:21,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2727620.0, ans=0.0 2024-08-14 15:43:25,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2024-08-14 15:43:40,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=12.0 2024-08-14 15:43:41,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.29 vs. limit=10.0 2024-08-14 15:43:43,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2727820.0, ans=0.0 2024-08-14 15:43:43,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2727820.0, ans=0.125 2024-08-14 15:43:49,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2727820.0, ans=0.125 2024-08-14 15:44:03,125 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-14 15:44:04,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-14 15:44:05,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-08-14 15:44:09,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 11950, loss[loss=0.08924, beats_loss=0.01155, ecapa_loss=0.000137, whisper_loss=0.07632, over 20030.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09217, over 3927270.29 frames. ], batch size: 79, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:44:11,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2728020.0, ans=0.2 2024-08-14 15:44:54,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-14 15:45:13,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2728420.0, ans=10.0 2024-08-14 15:45:23,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12000, loss[loss=0.07807, beats_loss=0.01435, ecapa_loss=0.0001426, whisper_loss=0.0623, over 17116.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001555, whisper_loss=0.09089, over 3901449.05 frames. ], batch size: 72, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:45:23,066 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 15:46:00,636 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.000545, whisper_loss=0.2473, over 922467.00 frames. 2024-08-14 15:46:18,325 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on SV_voxceleb1: loss=0.004271, beats_loss=0, ecapa_loss=0.0004271, whisper_loss=0, over 939242.00 frames. 2024-08-14 15:47:39,380 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8963, 2.2320, 2.6356, 3.0678], device='cuda:3') 2024-08-14 15:47:54,978 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.8037, 2.6194, 2.7890, 2.4624, 3.1942, 2.6404, 2.6208, 2.3257], device='cuda:3') 2024-08-14 15:48:10,159 INFO [train_multi_KD3.py:1149] (3/4) Epoch 19, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 15:48:10,163 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 15:48:15,010 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 15:48:15,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2024-08-14 15:48:17,729 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 15:48:23,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.361e+01 2.603e+01 2.893e+01 4.151e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 15:48:30,619 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-14 15:48:33,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2728620.0, ans=0.2 2024-08-14 15:48:38,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2024-08-14 15:48:45,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-08-14 15:48:51,148 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 15:48:58,604 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 15:49:18,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2728920.0, ans=0.125 2024-08-14 15:49:25,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12050, loss[loss=0.1023, beats_loss=0.008846, ecapa_loss=0.0001506, whisper_loss=0.09198, over 22341.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001558, whisper_loss=0.09085, over 3865830.73 frames. ], batch size: 88, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:49:28,284 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-14 15:49:36,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2729020.0, ans=0.125 2024-08-14 15:49:39,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2024-08-14 15:49:47,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2729120.0, ans=0.1 2024-08-14 15:50:01,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2729220.0, ans=0.1 2024-08-14 15:50:09,862 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 15:50:26,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-14 15:50:34,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2729420.0, ans=0.125 2024-08-14 15:50:39,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12100, loss[loss=0.1143, beats_loss=0.00942, ecapa_loss=0.000175, whisper_loss=0.1031, over 21447.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001572, whisper_loss=0.09161, over 3844001.16 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:50:42,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2729520.0, ans=0.02 2024-08-14 15:50:52,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.283e+01 2.551e+01 2.892e+01 3.951e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-14 15:50:56,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729620.0, ans=0.1 2024-08-14 15:51:10,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.44 vs. limit=6.0 2024-08-14 15:51:14,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729720.0, ans=0.1 2024-08-14 15:51:16,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2729720.0, ans=0.125 2024-08-14 15:51:37,650 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 15:51:37,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2729920.0, ans=0.125 2024-08-14 15:51:49,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2729920.0, ans=0.125 2024-08-14 15:51:51,837 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12150, loss[loss=0.1113, beats_loss=0.011, ecapa_loss=0.0001326, whisper_loss=0.09893, over 22856.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.000156, whisper_loss=0.09094, over 3853634.57 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:51:55,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2730020.0, ans=0.125 2024-08-14 15:52:19,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.13 vs. limit=10.0 2024-08-14 15:52:35,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2730320.0, ans=0.2 2024-08-14 15:53:06,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12200, loss[loss=0.109, beats_loss=0.009346, ecapa_loss=0.0001607, whisper_loss=0.09808, over 21829.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001563, whisper_loss=0.09032, over 3855762.57 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:53:19,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.397e+01 2.639e+01 2.869e+01 4.830e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-14 15:53:39,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2730720.0, ans=0.0 2024-08-14 15:53:55,278 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-14 15:53:59,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2730820.0, ans=0.1 2024-08-14 15:54:19,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12250, loss[loss=0.1162, beats_loss=0.00996, ecapa_loss=0.0001217, whisper_loss=0.1051, over 23640.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01057, ecapa_loss=0.0001573, whisper_loss=0.09111, over 3842719.46 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:54:28,708 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 15:54:43,177 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 15:54:43,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2731120.0, ans=0.125 2024-08-14 15:54:46,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2731120.0, ans=0.2 2024-08-14 15:54:49,333 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 15:55:03,923 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 15:55:32,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12300, loss[loss=0.102, beats_loss=0.01215, ecapa_loss=0.0001012, whisper_loss=0.08881, over 17216.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001574, whisper_loss=0.09081, over 3850138.85 frames. ], batch size: 63, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:55:38,874 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-14 15:55:44,998 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 15:55:46,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.391e+01 2.726e+01 3.127e+01 1.434e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-14 15:56:40,675 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 15:56:43,557 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 15:56:46,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12350, loss[loss=0.1239, beats_loss=0.008188, ecapa_loss=0.0001747, whisper_loss=0.114, over 16052.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001588, whisper_loss=0.09137, over 3844826.81 frames. ], batch size: 62, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:56:48,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2732020.0, ans=0.025 2024-08-14 15:56:56,605 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 15:56:56,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2732020.0, ans=0.125 2024-08-14 15:57:01,969 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 15:57:20,180 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 15:57:24,432 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-14 15:57:35,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-14 15:57:42,376 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 15:57:45,311 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 15:57:57,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2732420.0, ans=0.0 2024-08-14 15:57:59,050 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 15:58:00,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12400, loss[loss=0.1076, beats_loss=0.0095, ecapa_loss=0.0001362, whisper_loss=0.09673, over 21966.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.000158, whisper_loss=0.09154, over 3850601.66 frames. ], batch size: 87, lr: 3.26e-03, grad_scale: 1.152921504606847e+18 2024-08-14 15:58:13,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.330e+01 2.578e+01 2.980e+01 5.348e+02, threshold=5.156e+01, percent-clipped=2.0 2024-08-14 15:58:38,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732720.0, ans=0.125 2024-08-14 15:59:02,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2732920.0, ans=0.125 2024-08-14 15:59:04,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2732920.0, ans=0.2 2024-08-14 15:59:13,719 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-14 15:59:14,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-14 15:59:14,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12450, loss[loss=0.08873, beats_loss=0.01283, ecapa_loss=0.0001318, whisper_loss=0.07458, over 20534.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01045, ecapa_loss=0.0001578, whisper_loss=0.09178, over 3839032.55 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 15:59:26,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2733020.0, ans=0.05 2024-08-14 15:59:37,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2733120.0, ans=0.2 2024-08-14 15:59:43,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2733220.0, ans=0.0 2024-08-14 15:59:56,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2733220.0, ans=0.0 2024-08-14 15:59:59,421 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 16:00:08,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733320.0, ans=0.125 2024-08-14 16:00:16,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733420.0, ans=0.1 2024-08-14 16:00:29,435 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-14 16:00:30,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12500, loss[loss=0.07799, beats_loss=0.01045, ecapa_loss=0.0001459, whisper_loss=0.06608, over 15917.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01041, ecapa_loss=0.0001572, whisper_loss=0.0919, over 3833322.01 frames. ], batch size: 61, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:00:45,937 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.338e+01 2.506e+01 2.817e+01 7.820e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-14 16:00:58,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2733620.0, ans=10.0 2024-08-14 16:01:03,097 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 10 from Vox, 41 fro AS 2024-08-14 16:01:16,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-14 16:01:28,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2733820.0, ans=0.0 2024-08-14 16:01:30,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-08-14 16:01:46,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12550, loss[loss=0.1227, beats_loss=0.009116, ecapa_loss=0.0001828, whisper_loss=0.1117, over 22173.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01045, ecapa_loss=0.0001564, whisper_loss=0.09247, over 3843048.88 frames. ], batch size: 88, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:02:23,069 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 16:02:23,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734220.0, ans=0.1 2024-08-14 16:02:39,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-14 16:02:58,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2734420.0, ans=0.125 2024-08-14 16:03:00,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12600, loss[loss=0.09669, beats_loss=0.01224, ecapa_loss=0.000154, whisper_loss=0.08291, over 18641.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01051, ecapa_loss=0.000156, whisper_loss=0.09238, over 3845957.54 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:03:10,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2734520.0, ans=0.04949747468305833 2024-08-14 16:03:14,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.270e+01 2.592e+01 3.036e+01 4.281e+01, threshold=5.185e+01, percent-clipped=0.0 2024-08-14 16:03:23,745 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 16:03:24,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2734620.0, ans=0.125 2024-08-14 16:03:35,961 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 16:03:50,568 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 16:04:04,294 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 18 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-14 16:04:04,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2734920.0, ans=0.125 2024-08-14 16:04:05,642 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 16:04:14,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12650, loss[loss=0.07762, beats_loss=0.009379, ecapa_loss=0.0001884, whisper_loss=0.06635, over 14114.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.09156, over 3833265.21 frames. ], batch size: 58, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:04:15,942 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 16:04:21,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2024-08-14 16:04:22,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2735020.0, ans=0.1 2024-08-14 16:04:23,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2735020.0, ans=0.125 2024-08-14 16:04:29,400 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 16:04:32,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2735120.0, ans=0.125 2024-08-14 16:04:56,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2735220.0, ans=0.0 2024-08-14 16:05:02,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2735320.0, ans=0.125 2024-08-14 16:05:12,061 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-14 16:05:12,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2735420.0, ans=0.125 2024-08-14 16:05:13,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2735420.0, ans=0.2 2024-08-14 16:05:27,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12700, loss[loss=0.1219, beats_loss=0.008984, ecapa_loss=0.0001571, whisper_loss=0.1114, over 15889.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01076, ecapa_loss=0.0001545, whisper_loss=0.09112, over 3870206.94 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:05:42,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.369e+01 2.524e+01 2.927e+01 4.569e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-14 16:05:54,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2735620.0, ans=0.0 2024-08-14 16:05:57,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2735720.0, ans=0.125 2024-08-14 16:06:00,177 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 16:06:04,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2735720.0, ans=0.125 2024-08-14 16:06:15,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2735820.0, ans=0.125 2024-08-14 16:06:19,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735820.0, ans=0.125 2024-08-14 16:06:23,565 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 16:06:33,816 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 16:06:41,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12750, loss[loss=0.1106, beats_loss=0.009722, ecapa_loss=0.0001233, whisper_loss=0.09968, over 15045.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01076, ecapa_loss=0.000155, whisper_loss=0.09115, over 3865002.54 frames. ], batch size: 56, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:06:57,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2736120.0, ans=0.125 2024-08-14 16:07:02,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2736120.0, ans=0.125 2024-08-14 16:07:10,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2736220.0, ans=0.125 2024-08-14 16:07:38,572 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 16:07:48,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-14 16:07:55,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12800, loss[loss=0.09384, beats_loss=0.01257, ecapa_loss=0.0001394, whisper_loss=0.07987, over 20224.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001554, whisper_loss=0.09081, over 3836790.03 frames. ], batch size: 80, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:07:56,755 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 16:07:59,581 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 16:08:01,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2736520.0, ans=0.0 2024-08-14 16:08:01,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=12.0 2024-08-14 16:08:09,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.300e+01 2.515e+01 2.756e+01 3.404e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-14 16:08:23,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-14 16:08:47,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2736820.0, ans=0.2 2024-08-14 16:08:53,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2736920.0, ans=0.0 2024-08-14 16:09:00,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2736920.0, ans=0.125 2024-08-14 16:09:03,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2736920.0, ans=0.035 2024-08-14 16:09:05,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2736920.0, ans=0.125 2024-08-14 16:09:08,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2737020.0, ans=0.0 2024-08-14 16:09:09,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12850, loss[loss=0.1102, beats_loss=0.008703, ecapa_loss=0.000138, whisper_loss=0.1001, over 16860.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01078, ecapa_loss=0.0001559, whisper_loss=0.09042, over 3822741.13 frames. ], batch size: 65, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:09:30,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2737120.0, ans=0.125 2024-08-14 16:09:45,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2737220.0, ans=0.125 2024-08-14 16:09:48,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.20 vs. limit=22.5 2024-08-14 16:09:56,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2737320.0, ans=0.0 2024-08-14 16:10:08,775 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 16:10:21,424 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12900, loss[loss=0.1089, beats_loss=0.01119, ecapa_loss=0.0001794, whisper_loss=0.09595, over 18741.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001567, whisper_loss=0.09041, over 3814937.23 frames. ], batch size: 79, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:10:33,091 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 16:10:35,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.212e+01 2.559e+01 2.809e+01 4.062e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-14 16:10:37,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2737620.0, ans=0.1 2024-08-14 16:10:51,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2737720.0, ans=0.1 2024-08-14 16:10:53,515 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 16:11:10,178 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 16:11:18,715 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 16:11:27,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737920.0, ans=0.1 2024-08-14 16:11:33,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738020.0, ans=0.1 2024-08-14 16:11:34,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 12950, loss[loss=0.1228, beats_loss=0.008976, ecapa_loss=0.0001838, whisper_loss=0.112, over 18806.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001563, whisper_loss=0.09077, over 3826470.28 frames. ], batch size: 72, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:12:08,750 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 16:12:27,830 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.702e+00 2024-08-14 16:12:37,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2738420.0, ans=0.125 2024-08-14 16:12:48,192 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 16:12:49,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13000, loss[loss=0.1203, beats_loss=0.007882, ecapa_loss=0.0001605, whisper_loss=0.1108, over 17199.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001552, whisper_loss=0.09114, over 3865323.34 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:12:54,348 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 16:13:03,651 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 16:13:04,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.365e+01 2.543e+01 2.775e+01 1.627e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-14 16:13:09,883 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:13:10,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-14 16:13:12,516 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 16:13:26,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2738720.0, ans=0.125 2024-08-14 16:13:36,268 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 16:14:05,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13050, loss[loss=0.104, beats_loss=0.009245, ecapa_loss=0.0001911, whisper_loss=0.09283, over 13890.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01068, ecapa_loss=0.0001552, whisper_loss=0.09142, over 3852688.08 frames. ], batch size: 55, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:14:39,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2739220.0, ans=0.125 2024-08-14 16:15:00,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2739320.0, ans=0.0 2024-08-14 16:15:03,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2739420.0, ans=0.2 2024-08-14 16:15:07,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2739420.0, ans=10.0 2024-08-14 16:15:15,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2739420.0, ans=0.0 2024-08-14 16:15:18,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13100, loss[loss=0.1085, beats_loss=0.01141, ecapa_loss=0.0001352, whisper_loss=0.09571, over 17489.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01076, ecapa_loss=0.0001547, whisper_loss=0.09034, over 3817397.82 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:15:23,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-14 16:15:25,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-08-14 16:15:27,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2739520.0, ans=0.2 2024-08-14 16:15:33,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.291e+01 2.498e+01 2.880e+01 4.346e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-14 16:15:37,020 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 16:15:38,498 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 16:15:41,570 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 16:15:49,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2739720.0, ans=0.05 2024-08-14 16:16:07,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2739820.0, ans=0.09899494936611666 2024-08-14 16:16:07,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2739820.0, ans=0.09899494936611666 2024-08-14 16:16:13,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-14 16:16:16,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.90 vs. limit=10.0 2024-08-14 16:16:19,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2739920.0, ans=0.125 2024-08-14 16:16:24,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2739920.0, ans=0.5 2024-08-14 16:16:33,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13150, loss[loss=0.1004, beats_loss=0.008104, ecapa_loss=0.0001386, whisper_loss=0.09095, over 16024.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001549, whisper_loss=0.09112, over 3835052.07 frames. ], batch size: 59, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:16:45,486 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-14 16:16:45,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2740020.0, ans=0.0 2024-08-14 16:16:59,212 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 16:17:09,570 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 16:17:37,203 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 16:17:47,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13200, loss[loss=0.09943, beats_loss=0.01137, ecapa_loss=0.0001819, whisper_loss=0.08624, over 15683.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.000155, whisper_loss=0.09135, over 3836012.46 frames. ], batch size: 64, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:17:49,542 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 16:17:55,600 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:18:02,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.404e+01 2.825e+01 3.249e+01 1.605e+02, threshold=5.649e+01, percent-clipped=1.0 2024-08-14 16:18:36,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.14 vs. limit=22.5 2024-08-14 16:18:43,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2740820.0, ans=0.125 2024-08-14 16:18:49,794 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 16:18:55,721 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 16:19:00,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13250, loss[loss=0.08295, beats_loss=0.01329, ecapa_loss=0.0001196, whisper_loss=0.06847, over 18216.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001562, whisper_loss=0.09147, over 3826687.01 frames. ], batch size: 72, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:19:02,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2741020.0, ans=0.0 2024-08-14 16:19:03,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-14 16:19:12,468 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 34 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 16:19:17,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2741120.0, ans=0.125 2024-08-14 16:19:18,261 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 16:19:24,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2741120.0, ans=0.125 2024-08-14 16:19:33,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2024-08-14 16:19:36,994 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 16:19:38,733 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:19:41,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741220.0, ans=0.1 2024-08-14 16:19:45,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2024-08-14 16:19:49,823 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 16:20:12,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13300, loss[loss=0.08952, beats_loss=0.01198, ecapa_loss=0.0001879, whisper_loss=0.07566, over 14799.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001557, whisper_loss=0.0911, over 3847551.05 frames. ], batch size: 60, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:20:18,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2741520.0, ans=0.035 2024-08-14 16:20:20,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-14 16:20:27,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.357e+01 2.636e+01 2.927e+01 4.489e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:20:50,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2741720.0, ans=0.125 2024-08-14 16:20:50,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2741720.0, ans=0.2 2024-08-14 16:21:07,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-08-14 16:21:09,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741820.0, ans=0.1 2024-08-14 16:21:21,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-14 16:21:26,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13350, loss[loss=0.109, beats_loss=0.01105, ecapa_loss=0.0001488, whisper_loss=0.09642, over 20217.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001562, whisper_loss=0.0912, over 3853397.85 frames. ], batch size: 80, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:21:34,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2742020.0, ans=0.0 2024-08-14 16:21:44,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-14 16:21:44,685 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 16:21:52,917 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 16:21:57,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2024-08-14 16:21:58,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2742220.0, ans=0.2 2024-08-14 16:22:21,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742320.0, ans=0.125 2024-08-14 16:22:27,828 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 15 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-14 16:22:41,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13400, loss[loss=0.0852, beats_loss=0.01296, ecapa_loss=0.0001499, whisper_loss=0.07074, over 16327.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001554, whisper_loss=0.09071, over 3865995.79 frames. ], batch size: 69, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:22:55,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.422e+01 2.683e+01 3.045e+01 1.877e+02, threshold=5.367e+01, percent-clipped=2.0 2024-08-14 16:22:59,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2742620.0, ans=0.1 2024-08-14 16:23:02,429 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.409e-02 2024-08-14 16:23:11,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2742720.0, ans=0.125 2024-08-14 16:23:11,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=22.5 2024-08-14 16:23:13,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2742720.0, ans=0.0 2024-08-14 16:23:48,023 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 16:23:54,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13450, loss[loss=0.09488, beats_loss=0.01188, ecapa_loss=0.0001542, whisper_loss=0.08146, over 21568.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001562, whisper_loss=0.09096, over 3867840.13 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:24:05,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2743020.0, ans=0.125 2024-08-14 16:24:31,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-14 16:24:32,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2743220.0, ans=0.0 2024-08-14 16:24:50,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2743320.0, ans=0.1 2024-08-14 16:24:51,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2743320.0, ans=0.125 2024-08-14 16:24:53,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-14 16:25:03,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2743420.0, ans=0.0 2024-08-14 16:25:05,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2024-08-14 16:25:08,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13500, loss[loss=0.09383, beats_loss=0.01277, ecapa_loss=0.0001346, whisper_loss=0.07971, over 23994.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001566, whisper_loss=0.09092, over 3864818.71 frames. ], batch size: 95, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:25:22,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2743620.0, ans=0.0 2024-08-14 16:25:23,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.281e+01 2.536e+01 2.815e+01 4.454e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-14 16:25:27,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-08-14 16:25:44,261 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-14 16:25:54,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2743820.0, ans=0.0 2024-08-14 16:25:59,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2743820.0, ans=0.1 2024-08-14 16:26:08,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-14 16:26:13,580 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 16:26:16,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2743920.0, ans=0.2 2024-08-14 16:26:16,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2743920.0, ans=0.1 2024-08-14 16:26:22,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13550, loss[loss=0.1293, beats_loss=0.00942, ecapa_loss=0.0001452, whisper_loss=0.1184, over 23642.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001556, whisper_loss=0.09057, over 3882115.55 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:26:22,356 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 16:26:24,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2744020.0, ans=0.125 2024-08-14 16:26:32,698 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 16:26:37,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2744120.0, ans=0.0 2024-08-14 16:26:37,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744120.0, ans=0.1 2024-08-14 16:26:43,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2744120.0, ans=0.125 2024-08-14 16:26:46,068 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 16:26:47,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2744120.0, ans=0.125 2024-08-14 16:26:52,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2744220.0, ans=0.125 2024-08-14 16:26:56,175 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 16:27:14,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2744320.0, ans=0.0 2024-08-14 16:27:34,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13600, loss[loss=0.1162, beats_loss=0.008583, ecapa_loss=0.0001582, whisper_loss=0.106, over 22509.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001542, whisper_loss=0.09009, over 3866080.59 frames. ], batch size: 89, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:27:37,028 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 16:27:38,192 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:27:39,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2744520.0, ans=0.125 2024-08-14 16:27:43,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.65 vs. limit=15.0 2024-08-14 16:27:45,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2744520.0, ans=10.0 2024-08-14 16:27:46,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2744520.0, ans=0.125 2024-08-14 16:27:49,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.289e+01 2.556e+01 2.921e+01 4.683e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-14 16:27:50,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2744620.0, ans=0.0 2024-08-14 16:27:52,570 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-14 16:27:55,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2744620.0, ans=0.0 2024-08-14 16:28:00,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2744620.0, ans=0.2 2024-08-14 16:28:21,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2744820.0, ans=0.125 2024-08-14 16:28:37,295 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 16:28:48,842 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13650, loss[loss=0.1089, beats_loss=0.01103, ecapa_loss=0.000164, whisper_loss=0.09625, over 23050.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001548, whisper_loss=0.09083, over 3867379.36 frames. ], batch size: 94, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:28:53,272 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 16:29:08,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2745120.0, ans=0.125 2024-08-14 16:29:16,993 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 16:29:21,533 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 16:29:31,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2745220.0, ans=0.125 2024-08-14 16:29:33,590 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 16:29:33,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2745320.0, ans=0.07 2024-08-14 16:29:42,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2745320.0, ans=0.125 2024-08-14 16:29:45,425 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-14 16:29:54,215 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 16:30:01,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2745520.0, ans=0.0 2024-08-14 16:30:02,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13700, loss[loss=0.1295, beats_loss=0.008567, ecapa_loss=0.0001585, whisper_loss=0.1193, over 20556.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001537, whisper_loss=0.09122, over 3890860.52 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:30:11,488 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 16:30:16,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.313e+01 2.534e+01 2.793e+01 4.098e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 16:30:18,553 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 16:30:27,370 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:30:29,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2745620.0, ans=0.0 2024-08-14 16:30:35,877 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 16:30:38,640 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 24 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-14 16:30:43,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2024-08-14 16:30:47,295 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 16:31:07,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2745920.0, ans=0.125 2024-08-14 16:31:11,992 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-14 16:31:14,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13750, loss[loss=0.1102, beats_loss=0.008706, ecapa_loss=0.0001725, whisper_loss=0.09975, over 15689.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001536, whisper_loss=0.09085, over 3875078.35 frames. ], batch size: 65, lr: 3.25e-03, grad_scale: 5.764607523034235e+17 2024-08-14 16:31:14,998 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 16:31:19,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2024-08-14 16:31:35,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2746120.0, ans=0.0 2024-08-14 16:31:37,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2746120.0, ans=0.035 2024-08-14 16:31:41,456 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 16:31:48,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2746220.0, ans=0.5 2024-08-14 16:31:54,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2746220.0, ans=0.125 2024-08-14 16:31:58,922 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 16:32:02,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2746320.0, ans=0.125 2024-08-14 16:32:04,882 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 16:32:26,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2746420.0, ans=0.125 2024-08-14 16:32:27,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2746520.0, ans=0.025 2024-08-14 16:32:28,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13800, loss[loss=0.1041, beats_loss=0.01249, ecapa_loss=0.000125, whisper_loss=0.0904, over 20524.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01067, ecapa_loss=0.0001529, whisper_loss=0.09115, over 3861509.18 frames. ], batch size: 79, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:32:42,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2746520.0, ans=0.125 2024-08-14 16:32:45,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.356e+01 2.629e+01 2.983e+01 1.767e+02, threshold=5.258e+01, percent-clipped=3.0 2024-08-14 16:32:53,445 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:33:01,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-14 16:33:11,157 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-14 16:33:33,083 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 16:33:42,359 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 16:33:43,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13850, loss[loss=0.1363, beats_loss=0.008582, ecapa_loss=0.000175, whisper_loss=0.126, over 22305.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01073, ecapa_loss=0.0001541, whisper_loss=0.09113, over 3894697.60 frames. ], batch size: 91, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:33:45,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2747020.0, ans=0.125 2024-08-14 16:33:51,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2024-08-14 16:34:05,628 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-14 16:34:10,052 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 16:34:16,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-14 16:34:39,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.05 vs. limit=10.0 2024-08-14 16:34:42,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2747420.0, ans=0.125 2024-08-14 16:34:44,924 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 16:34:56,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13900, loss[loss=0.09146, beats_loss=0.009732, ecapa_loss=0.0001864, whisper_loss=0.07987, over 17695.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01064, ecapa_loss=0.0001538, whisper_loss=0.09234, over 3901399.73 frames. ], batch size: 76, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:35:01,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2747520.0, ans=0.125 2024-08-14 16:35:08,192 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 16:35:12,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.429e+01 2.660e+01 3.144e+01 1.636e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-14 16:35:17,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2747620.0, ans=0.125 2024-08-14 16:35:28,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2747720.0, ans=0.125 2024-08-14 16:35:31,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2747720.0, ans=0.125 2024-08-14 16:35:48,641 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 16:36:07,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2747920.0, ans=10.0 2024-08-14 16:36:09,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 13950, loss[loss=0.1393, beats_loss=0.008209, ecapa_loss=0.0001172, whisper_loss=0.1299, over 14958.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01061, ecapa_loss=0.0001529, whisper_loss=0.09287, over 3862512.24 frames. ], batch size: 54, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:36:15,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2748020.0, ans=0.125 2024-08-14 16:36:22,037 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:36:28,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2748120.0, ans=0.125 2024-08-14 16:36:30,526 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 16:36:38,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748220.0, ans=0.1 2024-08-14 16:36:55,703 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 16:37:02,653 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 16:37:08,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2748420.0, ans=0.0 2024-08-14 16:37:17,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-08-14 16:37:22,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14000, loss[loss=0.09803, beats_loss=0.01177, ecapa_loss=0.0001598, whisper_loss=0.08467, over 21472.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01058, ecapa_loss=0.0001518, whisper_loss=0.09348, over 3888903.46 frames. ], batch size: 90, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:37:28,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2748520.0, ans=0.125 2024-08-14 16:37:38,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.341e+01 2.629e+01 3.019e+01 1.116e+02, threshold=5.259e+01, percent-clipped=1.0 2024-08-14 16:37:40,845 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 16:37:41,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748620.0, ans=0.1 2024-08-14 16:37:44,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2748620.0, ans=0.05 2024-08-14 16:37:57,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2748720.0, ans=0.125 2024-08-14 16:37:59,889 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-14 16:38:12,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2748820.0, ans=0.0 2024-08-14 16:38:25,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748920.0, ans=0.1 2024-08-14 16:38:36,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14050, loss[loss=0.117, beats_loss=0.009422, ecapa_loss=0.0001986, whisper_loss=0.1056, over 22186.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01059, ecapa_loss=0.0001521, whisper_loss=0.09335, over 3888438.92 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:38:43,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2749020.0, ans=0.125 2024-08-14 16:38:51,933 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 16:38:53,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2749120.0, ans=0.0 2024-08-14 16:38:58,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2749120.0, ans=0.125 2024-08-14 16:39:03,834 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 16:39:17,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-14 16:39:29,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.60 vs. limit=10.0 2024-08-14 16:39:34,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2749420.0, ans=0.1 2024-08-14 16:39:40,154 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 15 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 16:39:44,595 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-14 16:39:47,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2749420.0, ans=0.0 2024-08-14 16:39:50,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14100, loss[loss=0.09616, beats_loss=0.01164, ecapa_loss=0.0001621, whisper_loss=0.08289, over 22803.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01071, ecapa_loss=0.0001525, whisper_loss=0.09217, over 3884850.87 frames. ], batch size: 94, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:39:52,236 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.192e+01 2024-08-14 16:39:56,142 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 16:40:06,748 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.359e+01 2.545e+01 2.723e+01 7.272e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-14 16:40:11,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2749620.0, ans=0.125 2024-08-14 16:40:20,018 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 16:40:34,720 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 16:40:35,053 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.612e+05 2024-08-14 16:40:42,214 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 16:41:03,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2750020.0, ans=0.125 2024-08-14 16:41:04,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14150, loss[loss=0.1015, beats_loss=0.0102, ecapa_loss=0.0001676, whisper_loss=0.08967, over 16975.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0107, ecapa_loss=0.0001523, whisper_loss=0.09196, over 3864006.40 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:41:16,548 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 16:41:27,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-14 16:41:29,511 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 16:41:35,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2750220.0, ans=0.125 2024-08-14 16:41:52,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2750320.0, ans=0.0 2024-08-14 16:42:18,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14200, loss[loss=0.1225, beats_loss=0.008997, ecapa_loss=0.0001326, whisper_loss=0.1122, over 19449.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001519, whisper_loss=0.09182, over 3876954.86 frames. ], batch size: 73, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:42:27,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-14 16:42:34,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.417e+01 2.674e+01 2.957e+01 3.053e+02, threshold=5.348e+01, percent-clipped=2.0 2024-08-14 16:42:36,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2750620.0, ans=0.0 2024-08-14 16:42:37,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2750620.0, ans=0.0 2024-08-14 16:42:54,253 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 16:42:54,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2750720.0, ans=0.125 2024-08-14 16:43:02,021 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-14 16:43:10,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-08-14 16:43:32,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14250, loss[loss=0.09603, beats_loss=0.01221, ecapa_loss=0.0001356, whisper_loss=0.08246, over 21600.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01061, ecapa_loss=0.0001515, whisper_loss=0.09294, over 3909523.20 frames. ], batch size: 88, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:43:37,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2751020.0, ans=0.125 2024-08-14 16:43:49,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2751120.0, ans=10.0 2024-08-14 16:43:51,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2751120.0, ans=0.0 2024-08-14 16:43:59,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.87 vs. limit=22.5 2024-08-14 16:44:14,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=12.0 2024-08-14 16:44:21,480 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 16:44:23,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2751320.0, ans=0.0 2024-08-14 16:44:24,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2751320.0, ans=0.2 2024-08-14 16:44:45,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14300, loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001687, whisper_loss=0.09216, over 22125.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001516, whisper_loss=0.09202, over 3885743.60 frames. ], batch size: 92, lr: 3.25e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:44:47,459 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 16:44:54,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2751520.0, ans=0.1 2024-08-14 16:45:02,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.444e+01 2.637e+01 2.966e+01 4.430e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-14 16:45:02,701 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 16:45:06,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2751620.0, ans=0.125 2024-08-14 16:45:29,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2024-08-14 16:45:47,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2751920.0, ans=0.025 2024-08-14 16:45:51,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2751920.0, ans=0.125 2024-08-14 16:45:59,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14350, loss[loss=0.08319, beats_loss=0.01146, ecapa_loss=0.0001885, whisper_loss=0.06985, over 21882.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001528, whisper_loss=0.09149, over 3885738.44 frames. ], batch size: 92, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:46:02,879 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 37 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 16:46:29,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2752220.0, ans=0.09899494936611666 2024-08-14 16:47:02,779 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 16:47:10,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2752420.0, ans=0.0 2024-08-14 16:47:15,255 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 16:47:16,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14400, loss[loss=0.1156, beats_loss=0.009677, ecapa_loss=0.0001444, whisper_loss=0.1045, over 22913.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.0001532, whisper_loss=0.09197, over 3889957.87 frames. ], batch size: 89, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:47:33,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2752620.0, ans=0.125 2024-08-14 16:47:34,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.337e+01 2.636e+01 2.855e+01 4.364e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-14 16:47:50,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2752720.0, ans=0.125 2024-08-14 16:48:10,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2752820.0, ans=10.0 2024-08-14 16:48:11,585 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 35 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 16:48:22,816 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 16:48:34,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 19, batch 14450, loss[loss=0.1227, beats_loss=0.01166, ecapa_loss=0.0001474, whisper_loss=0.1096, over 23103.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01056, ecapa_loss=0.0001536, whisper_loss=0.0931, over 3866053.00 frames. ], batch size: 91, lr: 3.24e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:48:43,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2753020.0, ans=0.2 2024-08-14 16:49:07,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-08-14 16:49:26,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2753320.0, ans=0.1 2024-08-14 16:49:27,817 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 16:50:13,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 0, loss[loss=0.1032, beats_loss=0.00918, ecapa_loss=0.0001418, whisper_loss=0.09262, over 19951.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.00918, ecapa_loss=0.0001418, whisper_loss=0.09262, over 19951.00 frames. ], batch size: 76, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:50:13,400 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 16:50:25,772 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5181, 3.9567, 4.2581, 4.4253], device='cuda:3') 2024-08-14 16:50:38,689 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1574, 3.4167, 2.9872, 2.6112], device='cuda:3') 2024-08-14 16:50:50,512 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005431, whisper_loss=0.2478, over 922467.00 frames. 2024-08-14 16:51:07,159 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on SV_voxceleb1: loss=0.004351, beats_loss=0, ecapa_loss=0.0004351, whisper_loss=0, over 939242.00 frames. 2024-08-14 16:52:53,138 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on AT_audioset: loss=0.02356, beats_loss=0.02356, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 16:52:53,142 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 16:53:01,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-14 16:53:08,753 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 16:53:26,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-14 16:53:45,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2753620.0, ans=0.125 2024-08-14 16:53:46,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.322e+01 2.623e+01 2.945e+01 5.325e+01, threshold=5.246e+01, percent-clipped=1.0 2024-08-14 16:54:13,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2753720.0, ans=0.1 2024-08-14 16:54:56,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 50, loss[loss=0.07625, beats_loss=0.01211, ecapa_loss=0.000175, whisper_loss=0.06239, over 21040.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009968, ecapa_loss=0.0001599, whisper_loss=0.09005, over 903124.34 frames. ], batch size: 91, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:55:17,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=12.0 2024-08-14 16:55:43,199 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-14 16:55:53,236 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 16:55:57,244 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-14 16:56:01,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754120.0, ans=0.1 2024-08-14 16:56:12,164 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 16:56:12,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-08-14 16:56:26,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2754220.0, ans=0.0 2024-08-14 16:56:33,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2754320.0, ans=0.95 2024-08-14 16:56:36,712 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 16:56:52,168 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 100, loss[loss=0.09353, beats_loss=0.00904, ecapa_loss=0.0001478, whisper_loss=0.08301, over 16254.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009618, ecapa_loss=0.0001591, whisper_loss=0.09032, over 1549257.33 frames. ], batch size: 63, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:56:54,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=12.0 2024-08-14 16:56:57,827 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 16:57:04,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2754420.0, ans=0.0 2024-08-14 16:57:13,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2754520.0, ans=0.2 2024-08-14 16:57:36,396 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 16:57:38,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.579e+01 2.856e+01 3.069e+01 3.660e+02, threshold=5.711e+01, percent-clipped=1.0 2024-08-14 16:57:46,212 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 16:58:01,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=2754720.0, ans=15.0 2024-08-14 16:58:19,703 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-14 16:58:38,842 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 150, loss[loss=0.1074, beats_loss=0.009045, ecapa_loss=0.0001632, whisper_loss=0.09677, over 23083.00 frames. ], tot_loss[loss=0.102, beats_loss=0.00975, ecapa_loss=0.0001569, whisper_loss=0.09066, over 2068913.28 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 16:58:47,932 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-14 16:59:10,985 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 16:59:16,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2755120.0, ans=0.07 2024-08-14 16:59:26,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2755120.0, ans=0.0 2024-08-14 16:59:48,796 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-14 16:59:50,264 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 16:59:57,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2024-08-14 17:00:03,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 200, loss[loss=0.1249, beats_loss=0.009591, ecapa_loss=0.0001687, whisper_loss=0.1136, over 22824.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009884, ecapa_loss=0.0001576, whisper_loss=0.09045, over 2427226.93 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:00:04,049 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 17:00:13,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-14 17:00:36,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.515e+01 2.860e+01 3.195e+01 5.864e+01, threshold=5.719e+01, percent-clipped=1.0 2024-08-14 17:00:40,017 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 17:00:40,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2755620.0, ans=0.04949747468305833 2024-08-14 17:00:43,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2755620.0, ans=0.125 2024-08-14 17:00:46,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-14 17:00:51,703 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 17:00:57,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2755720.0, ans=0.0 2024-08-14 17:01:18,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 250, loss[loss=0.09542, beats_loss=0.01101, ecapa_loss=0.0001452, whisper_loss=0.08296, over 22948.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009967, ecapa_loss=0.0001567, whisper_loss=0.09078, over 2729799.65 frames. ], batch size: 91, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:01:21,431 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 17:01:24,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2755920.0, ans=0.2 2024-08-14 17:01:39,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2756020.0, ans=0.125 2024-08-14 17:02:02,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2756220.0, ans=0.125 2024-08-14 17:02:03,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2756220.0, ans=0.1 2024-08-14 17:02:07,630 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-14 17:02:07,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2756220.0, ans=0.125 2024-08-14 17:02:11,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2756220.0, ans=0.125 2024-08-14 17:02:30,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 300, loss[loss=0.1101, beats_loss=0.01046, ecapa_loss=0.0001261, whisper_loss=0.09837, over 22023.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0101, ecapa_loss=0.0001582, whisper_loss=0.09064, over 2974722.09 frames. ], batch size: 84, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:02:31,397 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 17:02:52,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2756520.0, ans=0.125 2024-08-14 17:02:56,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-08-14 17:03:00,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.324e+01 2.572e+01 2.876e+01 1.018e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-14 17:03:01,444 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 17:03:16,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-14 17:03:22,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2756720.0, ans=0.0 2024-08-14 17:03:24,705 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:03:41,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 350, loss[loss=0.1044, beats_loss=0.009722, ecapa_loss=0.0001528, whisper_loss=0.09318, over 16704.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01028, ecapa_loss=0.0001558, whisper_loss=0.08964, over 3155181.12 frames. ], batch size: 61, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:04:07,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2757020.0, ans=0.2 2024-08-14 17:04:12,272 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-14 17:04:19,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2757120.0, ans=0.125 2024-08-14 17:04:48,636 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 17:04:52,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 400, loss[loss=0.08952, beats_loss=0.01109, ecapa_loss=0.0001411, whisper_loss=0.07702, over 17140.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001541, whisper_loss=0.08986, over 3310077.72 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:04:52,770 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-14 17:04:52,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2757420.0, ans=0.1 2024-08-14 17:05:23,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.550e+01 2.888e+01 2.244e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-14 17:05:27,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2757620.0, ans=0.125 2024-08-14 17:05:29,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757620.0, ans=0.1 2024-08-14 17:05:29,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-14 17:05:31,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=22.5 2024-08-14 17:05:37,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757720.0, ans=0.0 2024-08-14 17:05:54,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2757820.0, ans=0.125 2024-08-14 17:05:54,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2757820.0, ans=0.125 2024-08-14 17:06:06,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2757920.0, ans=0.125 2024-08-14 17:06:07,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 450, loss[loss=0.07171, beats_loss=0.01026, ecapa_loss=0.0001466, whisper_loss=0.05998, over 15572.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.000154, whisper_loss=0.08968, over 3422427.86 frames. ], batch size: 58, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:06:09,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2757920.0, ans=0.2 2024-08-14 17:06:12,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-14 17:06:18,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2757920.0, ans=0.0 2024-08-14 17:06:46,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-14 17:06:47,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2758120.0, ans=0.0 2024-08-14 17:07:16,803 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-14 17:07:29,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 500, loss[loss=0.08553, beats_loss=0.009427, ecapa_loss=0.0001889, whisper_loss=0.07421, over 18417.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001548, whisper_loss=0.08894, over 3531486.13 frames. ], batch size: 75, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:07:30,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2758420.0, ans=0.2 2024-08-14 17:07:41,647 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 17:07:46,750 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 17:08:03,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.300e+01 2.536e+01 2.836e+01 8.494e+01, threshold=5.071e+01, percent-clipped=3.0 2024-08-14 17:08:15,549 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 17:08:28,129 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 17:08:35,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2758820.0, ans=0.1 2024-08-14 17:08:51,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 550, loss[loss=0.1135, beats_loss=0.009638, ecapa_loss=0.0001592, whisper_loss=0.1023, over 20330.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001542, whisper_loss=0.08919, over 3614947.54 frames. ], batch size: 79, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:08:52,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2758920.0, ans=0.2 2024-08-14 17:09:11,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2759020.0, ans=0.2 2024-08-14 17:09:18,432 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 17:09:22,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759020.0, ans=0.1 2024-08-14 17:09:27,947 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 17:09:45,119 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 17:10:01,718 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 17:10:05,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-14 17:10:09,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-14 17:10:13,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-14 17:10:17,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759420.0, ans=0.1 2024-08-14 17:10:18,420 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 600, loss[loss=0.113, beats_loss=0.0112, ecapa_loss=0.0001438, whisper_loss=0.1003, over 21736.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.08934, over 3654262.14 frames. ], batch size: 87, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:10:31,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-14 17:10:42,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2759520.0, ans=0.125 2024-08-14 17:10:46,894 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 17:10:55,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.286e+01 2.611e+01 2.966e+01 2.824e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 17:11:03,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2759620.0, ans=0.0 2024-08-14 17:11:19,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2759720.0, ans=0.125 2024-08-14 17:11:25,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2759720.0, ans=0.125 2024-08-14 17:11:28,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2759820.0, ans=0.1 2024-08-14 17:11:33,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2759820.0, ans=0.0 2024-08-14 17:11:35,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2759820.0, ans=0.0 2024-08-14 17:11:38,960 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 17:11:45,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 650, loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001698, whisper_loss=0.09222, over 22763.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001536, whisper_loss=0.08877, over 3715608.85 frames. ], batch size: 91, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:12:32,376 WARNING [optim.py:496] (3/4) Scaling gradients by 0.059259023517370224, model_norm_threshold=52.210243225097656 2024-08-14 17:12:32,548 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.800e+04, grad_sumsq=2.525e+04, orig_rms_sq=3.485e+00 2024-08-14 17:12:33,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2760120.0, ans=0.0 2024-08-14 17:12:45,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2760220.0, ans=0.1 2024-08-14 17:12:51,929 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 17:13:02,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2760320.0, ans=0.1 2024-08-14 17:13:11,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 700, loss[loss=0.1085, beats_loss=0.008473, ecapa_loss=0.0001633, whisper_loss=0.09839, over 15870.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.000154, whisper_loss=0.08884, over 3720221.94 frames. ], batch size: 62, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:13:14,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2760420.0, ans=0.125 2024-08-14 17:13:44,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2760620.0, ans=0.125 2024-08-14 17:13:46,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.378e+01 2.624e+01 2.914e+01 8.811e+02, threshold=5.248e+01, percent-clipped=3.0 2024-08-14 17:13:48,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2760620.0, ans=0.1 2024-08-14 17:13:57,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2760620.0, ans=0.125 2024-08-14 17:13:57,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=12.0 2024-08-14 17:14:06,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.39 vs. limit=10.0 2024-08-14 17:14:16,309 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 17:14:20,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.87 vs. limit=22.5 2024-08-14 17:14:27,157 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 17:14:29,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2760820.0, ans=0.125 2024-08-14 17:14:36,169 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 750, loss[loss=0.1007, beats_loss=0.01092, ecapa_loss=0.0001538, whisper_loss=0.08822, over 21991.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001533, whisper_loss=0.0891, over 3763987.15 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:14:40,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2760920.0, ans=0.125 2024-08-14 17:14:50,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2760920.0, ans=0.125 2024-08-14 17:14:55,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2761020.0, ans=0.125 2024-08-14 17:15:01,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2761020.0, ans=0.0 2024-08-14 17:15:09,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2761120.0, ans=0.2 2024-08-14 17:15:17,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2761120.0, ans=10.0 2024-08-14 17:15:33,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2761220.0, ans=0.125 2024-08-14 17:16:00,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 800, loss[loss=0.1018, beats_loss=0.009267, ecapa_loss=0.0001568, whisper_loss=0.09099, over 17303.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001537, whisper_loss=0.08888, over 3759766.67 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:16:06,765 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 17:16:11,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2761420.0, ans=0.0 2024-08-14 17:16:13,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2761420.0, ans=0.125 2024-08-14 17:16:32,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2761620.0, ans=0.0 2024-08-14 17:16:33,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.284e+01 2.457e+01 2.814e+01 4.816e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-14 17:16:38,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-14 17:16:52,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2761720.0, ans=0.125 2024-08-14 17:16:54,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2761720.0, ans=0.125 2024-08-14 17:17:18,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 850, loss[loss=0.1112, beats_loss=0.01305, ecapa_loss=0.0001174, whisper_loss=0.09697, over 16757.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01054, ecapa_loss=0.0001539, whisper_loss=0.08837, over 3769588.48 frames. ], batch size: 65, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:17:23,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2761920.0, ans=0.125 2024-08-14 17:17:32,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2761920.0, ans=0.0 2024-08-14 17:17:46,836 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 17:17:56,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2762120.0, ans=0.1 2024-08-14 17:18:03,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2762120.0, ans=0.125 2024-08-14 17:18:40,115 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-14 17:18:43,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 900, loss[loss=0.1033, beats_loss=0.01115, ecapa_loss=0.0001108, whisper_loss=0.09106, over 21045.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001532, whisper_loss=0.08847, over 3761230.39 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:18:54,048 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 13 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-14 17:19:12,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2762520.0, ans=0.125 2024-08-14 17:19:16,920 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-14 17:19:20,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2762620.0, ans=0.125 2024-08-14 17:19:21,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.291e+01 2.536e+01 2.780e+01 9.206e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-14 17:19:29,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2762620.0, ans=0.1 2024-08-14 17:19:39,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2762720.0, ans=0.2 2024-08-14 17:19:54,746 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 17:19:57,936 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 17:20:00,655 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-14 17:20:02,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2762820.0, ans=0.2 2024-08-14 17:20:03,359 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 17:20:03,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2762820.0, ans=0.07 2024-08-14 17:20:06,496 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 950, loss[loss=0.1094, beats_loss=0.009841, ecapa_loss=0.0001427, whisper_loss=0.0981, over 16289.00 frames. ], tot_loss[loss=0.09987, beats_loss=0.01057, ecapa_loss=0.000153, whisper_loss=0.08778, over 3774129.60 frames. ], batch size: 61, lr: 3.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:21:23,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2763220.0, ans=0.2 2024-08-14 17:21:47,761 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 17:21:51,115 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 17:21:54,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1000, loss[loss=0.09771, beats_loss=0.009447, ecapa_loss=0.0001536, whisper_loss=0.08673, over 17364.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001526, whisper_loss=0.08852, over 3768860.85 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:22:14,376 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 17:22:28,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2024-08-14 17:22:29,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2763520.0, ans=0.125 2024-08-14 17:22:33,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.269e+01 2.540e+01 2.773e+01 4.748e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-14 17:22:40,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-14 17:22:56,931 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:22:59,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2763720.0, ans=0.05 2024-08-14 17:23:06,756 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 17:23:36,639 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1050, loss[loss=0.0786, beats_loss=0.01531, ecapa_loss=0.0001592, whisper_loss=0.06169, over 15691.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001518, whisper_loss=0.08849, over 3782873.54 frames. ], batch size: 65, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:23:50,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-14 17:23:52,144 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-14 17:23:56,762 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 17:24:04,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2764020.0, ans=0.1 2024-08-14 17:24:16,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2764020.0, ans=0.125 2024-08-14 17:24:53,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2024-08-14 17:24:57,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2024-08-14 17:25:01,885 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-14 17:25:08,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2764220.0, ans=0.2 2024-08-14 17:25:36,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1100, loss[loss=0.1113, beats_loss=0.01189, ecapa_loss=0.0001312, whisper_loss=0.09811, over 23603.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.08831, over 3814635.70 frames. ], batch size: 92, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:25:38,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2764420.0, ans=0.125 2024-08-14 17:25:40,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764420.0, ans=0.1 2024-08-14 17:26:17,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-14 17:26:29,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.358e+01 2.558e+01 2.908e+01 1.671e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-14 17:26:33,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2764620.0, ans=0.125 2024-08-14 17:26:46,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2764620.0, ans=0.2 2024-08-14 17:26:51,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2024-08-14 17:27:11,303 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 17:27:39,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1150, loss[loss=0.1149, beats_loss=0.01096, ecapa_loss=0.000139, whisper_loss=0.1025, over 15796.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.0001514, whisper_loss=0.08868, over 3822059.85 frames. ], batch size: 57, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:28:13,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2765020.0, ans=0.125 2024-08-14 17:28:15,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2765020.0, ans=0.125 2024-08-14 17:28:23,947 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 17:28:29,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2765120.0, ans=0.1 2024-08-14 17:28:57,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2765220.0, ans=0.0 2024-08-14 17:29:19,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-14 17:29:24,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1200, loss[loss=0.104, beats_loss=0.009035, ecapa_loss=0.0001744, whisper_loss=0.09322, over 19910.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001513, whisper_loss=0.08946, over 3780833.41 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:29:36,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2765420.0, ans=0.125 2024-08-14 17:29:46,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2765520.0, ans=0.0 2024-08-14 17:29:53,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2765620.0, ans=0.125 2024-08-14 17:29:54,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.420e+01 2.672e+01 3.121e+01 5.638e+01, threshold=5.344e+01, percent-clipped=1.0 2024-08-14 17:30:17,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2765720.0, ans=0.2 2024-08-14 17:30:19,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-14 17:30:38,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1250, loss[loss=0.1012, beats_loss=0.009972, ecapa_loss=0.000162, whisper_loss=0.08956, over 19818.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01066, ecapa_loss=0.0001508, whisper_loss=0.08893, over 3760540.69 frames. ], batch size: 79, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:30:44,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-14 17:31:10,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-14 17:31:53,951 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-14 17:31:54,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-08-14 17:31:58,363 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1300, loss[loss=0.08872, beats_loss=0.01138, ecapa_loss=0.0001254, whisper_loss=0.07609, over 14183.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01068, ecapa_loss=0.0001504, whisper_loss=0.08896, over 3764265.23 frames. ], batch size: 54, lr: 3.15e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 17:32:09,576 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 17:32:13,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:32:18,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-14 17:32:31,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.335e+01 2.518e+01 2.895e+01 4.834e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-14 17:32:40,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2766620.0, ans=0.125 2024-08-14 17:32:41,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2766620.0, ans=0.125 2024-08-14 17:32:43,816 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 17:32:48,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2766720.0, ans=0.0 2024-08-14 17:32:54,482 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 17:33:00,790 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 17:33:05,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2766820.0, ans=0.5 2024-08-14 17:33:17,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1350, loss[loss=0.1144, beats_loss=0.0114, ecapa_loss=0.0001644, whisper_loss=0.1014, over 14753.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0108, ecapa_loss=0.0001494, whisper_loss=0.08853, over 3768474.23 frames. ], batch size: 57, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:33:24,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2766920.0, ans=0.125 2024-08-14 17:33:45,227 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-14 17:34:39,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2767320.0, ans=0.125 2024-08-14 17:34:42,724 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1400, loss[loss=0.08042, beats_loss=0.01099, ecapa_loss=0.0002044, whisper_loss=0.06739, over 19524.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.0892, over 3778644.32 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:34:45,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2767420.0, ans=0.125 2024-08-14 17:35:04,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2767520.0, ans=0.0 2024-08-14 17:35:10,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-14 17:35:18,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.291e+01 2.563e+01 2.822e+01 1.881e+02, threshold=5.126e+01, percent-clipped=2.0 2024-08-14 17:35:36,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-14 17:35:48,435 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 17:35:53,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2767820.0, ans=0.1 2024-08-14 17:36:41,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1450, loss[loss=0.07698, beats_loss=0.01229, ecapa_loss=0.0001633, whisper_loss=0.06306, over 22220.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01071, ecapa_loss=0.0001487, whisper_loss=0.08878, over 3805145.26 frames. ], batch size: 96, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:37:18,822 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:37:18,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2768120.0, ans=10.0 2024-08-14 17:37:19,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=10.0 2024-08-14 17:37:25,499 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-14 17:37:47,283 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 17:37:52,253 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 17:38:03,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1500, loss[loss=0.1262, beats_loss=0.008802, ecapa_loss=0.0001786, whisper_loss=0.1156, over 19412.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01067, ecapa_loss=0.0001489, whisper_loss=0.08898, over 3823692.27 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:38:15,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2768420.0, ans=0.0 2024-08-14 17:38:18,902 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 17:38:19,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2768520.0, ans=0.0 2024-08-14 17:38:21,823 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 17:38:27,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2768520.0, ans=0.2 2024-08-14 17:38:28,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768520.0, ans=0.1 2024-08-14 17:38:28,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2768520.0, ans=0.0 2024-08-14 17:38:28,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2768520.0, ans=0.1 2024-08-14 17:38:37,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.247e+01 2.495e+01 2.740e+01 8.085e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-14 17:38:48,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-14 17:38:50,827 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-14 17:38:52,543 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 17:38:53,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2024-08-14 17:38:59,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2768720.0, ans=0.0 2024-08-14 17:39:00,385 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 17:39:03,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2768720.0, ans=0.125 2024-08-14 17:39:11,011 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-14 17:39:11,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2768820.0, ans=0.09899494936611666 2024-08-14 17:39:12,469 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 17:39:26,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1550, loss[loss=0.1105, beats_loss=0.009249, ecapa_loss=0.0001481, whisper_loss=0.09973, over 18447.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01066, ecapa_loss=0.0001485, whisper_loss=0.08916, over 3814991.31 frames. ], batch size: 72, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:39:33,938 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 17:39:39,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768920.0, ans=0.1 2024-08-14 17:39:49,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-14 17:39:55,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769020.0, ans=0.1 2024-08-14 17:40:11,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2769220.0, ans=0.0 2024-08-14 17:40:12,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-08-14 17:40:37,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2769320.0, ans=0.0 2024-08-14 17:40:37,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769320.0, ans=0.1 2024-08-14 17:40:45,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1600, loss[loss=0.1075, beats_loss=0.01174, ecapa_loss=0.0001522, whisper_loss=0.09423, over 22674.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01068, ecapa_loss=0.0001487, whisper_loss=0.08919, over 3842827.91 frames. ], batch size: 91, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:41:09,682 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-14 17:41:14,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2769520.0, ans=0.125 2024-08-14 17:41:15,613 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 17:41:17,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.359e+01 2.603e+01 2.856e+01 4.128e+01, threshold=5.205e+01, percent-clipped=0.0 2024-08-14 17:41:22,471 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.516e-01 2024-08-14 17:41:24,784 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-14 17:42:01,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1650, loss[loss=0.104, beats_loss=0.009385, ecapa_loss=0.0001747, whisper_loss=0.09292, over 13840.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.0001489, whisper_loss=0.08917, over 3813906.64 frames. ], batch size: 56, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:42:02,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2769920.0, ans=0.1 2024-08-14 17:42:07,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2769920.0, ans=0.125 2024-08-14 17:42:14,149 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 17:42:29,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2770020.0, ans=0.125 2024-08-14 17:42:41,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2770120.0, ans=0.0 2024-08-14 17:43:14,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.38 vs. limit=10.0 2024-08-14 17:43:18,620 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1700, loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001511, whisper_loss=0.09134, over 22829.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.0899, over 3807576.62 frames. ], batch size: 93, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:43:26,460 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-14 17:43:51,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.335e+01 2.576e+01 2.933e+01 1.462e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-14 17:43:52,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2770620.0, ans=0.125 2024-08-14 17:43:53,758 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-14 17:44:10,427 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 17:44:11,839 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 17:44:31,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2770820.0, ans=0.0 2024-08-14 17:44:34,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1750, loss[loss=0.08473, beats_loss=0.009454, ecapa_loss=0.0002067, whisper_loss=0.07321, over 16075.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001498, whisper_loss=0.09026, over 3824728.53 frames. ], batch size: 65, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:44:36,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770920.0, ans=0.1 2024-08-14 17:44:36,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2770920.0, ans=0.125 2024-08-14 17:44:39,459 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 17:44:52,740 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-14 17:44:54,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2771020.0, ans=0.2 2024-08-14 17:45:01,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2771020.0, ans=0.125 2024-08-14 17:45:04,848 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-14 17:45:05,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-14 17:45:06,481 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 17:45:08,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-14 17:45:38,376 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 17:45:50,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1800, loss[loss=0.069, beats_loss=0.01267, ecapa_loss=0.0001537, whisper_loss=0.0548, over 19450.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001499, whisper_loss=0.09031, over 3818980.42 frames. ], batch size: 80, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:46:18,489 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.914e+00 2024-08-14 17:46:22,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.297e+01 2.564e+01 2.917e+01 8.345e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-14 17:46:37,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2771720.0, ans=0.125 2024-08-14 17:46:37,980 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-14 17:46:40,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2771720.0, ans=0.125 2024-08-14 17:47:06,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1850, loss[loss=0.1291, beats_loss=0.005926, ecapa_loss=0.0001824, whisper_loss=0.1214, over 17903.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001498, whisper_loss=0.09062, over 3822628.40 frames. ], batch size: 66, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:47:35,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2772120.0, ans=0.2 2024-08-14 17:47:37,792 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-14 17:47:57,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2772220.0, ans=0.0 2024-08-14 17:48:15,881 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 17:48:19,183 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-14 17:48:19,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2772320.0, ans=0.0 2024-08-14 17:48:21,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1900, loss[loss=0.0997, beats_loss=0.01013, ecapa_loss=0.0001681, whisper_loss=0.08789, over 18494.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001507, whisper_loss=0.08951, over 3800228.19 frames. ], batch size: 74, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:48:31,368 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 27 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-14 17:48:45,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2772520.0, ans=0.0 2024-08-14 17:48:48,629 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 17:48:54,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.272e+01 2.538e+01 2.800e+01 8.979e+01, threshold=5.075e+01, percent-clipped=2.0 2024-08-14 17:49:02,291 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 17:49:37,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 1950, loss[loss=0.129, beats_loss=0.009215, ecapa_loss=0.0001622, whisper_loss=0.1182, over 23717.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001517, whisper_loss=0.08996, over 3795785.65 frames. ], batch size: 94, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:50:16,534 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 17:50:18,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2773120.0, ans=0.125 2024-08-14 17:50:21,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2773120.0, ans=0.0 2024-08-14 17:50:25,253 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 17:50:56,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2000, loss[loss=0.07822, beats_loss=0.01377, ecapa_loss=0.0001509, whisper_loss=0.06294, over 14126.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001516, whisper_loss=0.09053, over 3794414.66 frames. ], batch size: 59, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:51:03,328 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.186e-02 2024-08-14 17:51:29,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.380e+01 2.636e+01 2.886e+01 1.186e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 17:51:41,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-14 17:51:43,900 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-14 17:51:44,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2773720.0, ans=0.0 2024-08-14 17:51:54,427 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 17:52:14,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2050, loss[loss=0.09038, beats_loss=0.01044, ecapa_loss=0.0001667, whisper_loss=0.07827, over 16371.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001503, whisper_loss=0.09015, over 3812143.94 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:52:26,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2773920.0, ans=0.125 2024-08-14 17:52:50,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-08-14 17:53:03,873 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 17:53:07,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-14 17:53:10,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2774220.0, ans=0.125 2024-08-14 17:53:23,970 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.291e-01 2024-08-14 17:53:27,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2774320.0, ans=0.0 2024-08-14 17:53:29,380 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 17:53:30,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2100, loss[loss=0.1099, beats_loss=0.009768, ecapa_loss=0.0001435, whisper_loss=0.09872, over 22744.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001501, whisper_loss=0.09004, over 3789123.59 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:54:02,010 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 12 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 17:54:03,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.339e+01 2.575e+01 2.832e+01 4.254e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-14 17:54:22,570 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-14 17:54:48,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2150, loss[loss=0.1069, beats_loss=0.01178, ecapa_loss=0.0001626, whisper_loss=0.09347, over 22644.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001485, whisper_loss=0.08978, over 3813049.90 frames. ], batch size: 92, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:54:49,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2774920.0, ans=0.125 2024-08-14 17:54:50,564 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 17:55:00,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2774920.0, ans=0.035 2024-08-14 17:56:06,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2200, loss[loss=0.1121, beats_loss=0.009395, ecapa_loss=0.0001758, whisper_loss=0.101, over 21428.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.09076, over 3832499.31 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:56:15,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2024-08-14 17:56:18,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2775420.0, ans=0.125 2024-08-14 17:56:23,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2024-08-14 17:56:36,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.388e+01 2.686e+01 3.163e+01 6.240e+01, threshold=5.371e+01, percent-clipped=1.0 2024-08-14 17:56:55,399 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 17:57:05,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-14 17:57:05,848 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 17:57:17,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2775820.0, ans=0.125 2024-08-14 17:57:21,399 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2250, loss[loss=0.1172, beats_loss=0.008478, ecapa_loss=0.0001733, whisper_loss=0.1069, over 22116.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001494, whisper_loss=0.09119, over 3868420.32 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:57:45,395 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 17:57:46,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2776020.0, ans=0.125 2024-08-14 17:57:56,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.64 vs. limit=6.0 2024-08-14 17:58:00,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2776120.0, ans=0.0 2024-08-14 17:58:14,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776220.0, ans=0.125 2024-08-14 17:58:33,901 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 17:58:39,585 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-14 17:58:40,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2300, loss[loss=0.135, beats_loss=0.007949, ecapa_loss=0.0001782, whisper_loss=0.1253, over 16180.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.09107, over 3870935.35 frames. ], batch size: 63, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 17:58:48,628 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 17:58:54,733 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-14 17:59:01,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-08-14 17:59:10,018 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 17:59:12,863 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.401e+01 2.652e+01 3.055e+01 1.168e+02, threshold=5.304e+01, percent-clipped=4.0 2024-08-14 17:59:34,492 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 17:59:37,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2776720.0, ans=0.0 2024-08-14 17:59:57,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2350, loss[loss=0.102, beats_loss=0.01022, ecapa_loss=0.0001191, whisper_loss=0.09057, over 18919.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001507, whisper_loss=0.09049, over 3881703.94 frames. ], batch size: 69, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:00:11,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-08-14 18:00:12,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2777020.0, ans=0.125 2024-08-14 18:00:17,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2777020.0, ans=0.2 2024-08-14 18:00:18,122 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 18:00:30,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2777120.0, ans=0.125 2024-08-14 18:00:35,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777120.0, ans=0.1 2024-08-14 18:00:47,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2777220.0, ans=0.0 2024-08-14 18:00:48,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777220.0, ans=0.1 2024-08-14 18:00:57,839 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 18:00:58,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2777220.0, ans=0.0 2024-08-14 18:01:02,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2777320.0, ans=0.07 2024-08-14 18:01:11,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 18:01:16,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2777320.0, ans=0.2 2024-08-14 18:01:19,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2400, loss[loss=0.09208, beats_loss=0.01213, ecapa_loss=0.0001225, whisper_loss=0.07873, over 21457.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001513, whisper_loss=0.09098, over 3846703.98 frames. ], batch size: 85, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:01:25,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2777420.0, ans=0.125 2024-08-14 18:01:27,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-14 18:01:36,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2777520.0, ans=0.0 2024-08-14 18:01:40,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2777520.0, ans=0.125 2024-08-14 18:01:52,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-14 18:01:52,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.341e+01 2.588e+01 3.015e+01 2.629e+02, threshold=5.175e+01, percent-clipped=2.0 2024-08-14 18:01:53,622 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-14 18:01:54,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2777620.0, ans=0.0 2024-08-14 18:01:59,959 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 18:02:13,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777720.0, ans=0.1 2024-08-14 18:02:42,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2450, loss[loss=0.1229, beats_loss=0.008426, ecapa_loss=0.0001967, whisper_loss=0.1125, over 21832.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001514, whisper_loss=0.09101, over 3862055.36 frames. ], batch size: 88, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:02:54,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=15.0 2024-08-14 18:03:18,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2778120.0, ans=0.125 2024-08-14 18:03:39,050 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-14 18:03:44,016 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-14 18:03:46,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2778320.0, ans=0.0 2024-08-14 18:03:48,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2778320.0, ans=0.0 2024-08-14 18:04:01,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2778320.0, ans=0.0 2024-08-14 18:04:03,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2500, loss[loss=0.09233, beats_loss=0.01187, ecapa_loss=0.0001664, whisper_loss=0.0788, over 21624.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001525, whisper_loss=0.09061, over 3867842.72 frames. ], batch size: 90, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:04:07,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2778420.0, ans=0.1 2024-08-14 18:04:12,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2778420.0, ans=0.0 2024-08-14 18:04:35,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2778520.0, ans=0.0 2024-08-14 18:04:39,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.393e+01 2.682e+01 2.958e+01 4.919e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-14 18:04:48,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-14 18:04:59,979 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-14 18:05:07,605 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 18:05:10,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=15.0 2024-08-14 18:05:14,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2778820.0, ans=0.1 2024-08-14 18:05:16,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2778820.0, ans=0.1 2024-08-14 18:05:19,800 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 18:05:24,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2550, loss[loss=0.1132, beats_loss=0.01009, ecapa_loss=0.0001476, whisper_loss=0.1017, over 22855.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001525, whisper_loss=0.09045, over 3856067.38 frames. ], batch size: 92, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:05:31,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2778920.0, ans=0.0 2024-08-14 18:05:38,546 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 18:05:48,371 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 18:06:14,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2779220.0, ans=0.0 2024-08-14 18:06:28,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:06:35,280 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 18:06:46,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2600, loss[loss=0.08397, beats_loss=0.0119, ecapa_loss=0.0001225, whisper_loss=0.07085, over 14219.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001519, whisper_loss=0.09022, over 3820629.79 frames. ], batch size: 55, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:06:52,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2779420.0, ans=0.125 2024-08-14 18:06:52,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2779420.0, ans=0.125 2024-08-14 18:06:57,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2779420.0, ans=0.1 2024-08-14 18:07:00,807 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-14 18:07:08,714 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 18:07:14,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2779520.0, ans=0.0 2024-08-14 18:07:16,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2779520.0, ans=0.0 2024-08-14 18:07:20,238 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 18:07:20,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2779620.0, ans=0.125 2024-08-14 18:07:21,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.268e+01 2.541e+01 2.782e+01 4.582e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-14 18:07:30,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2779620.0, ans=0.1 2024-08-14 18:07:44,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2779720.0, ans=0.125 2024-08-14 18:07:48,084 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 18:08:07,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2650, loss[loss=0.08393, beats_loss=0.01024, ecapa_loss=0.0001773, whisper_loss=0.07192, over 13873.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001538, whisper_loss=0.09066, over 3836034.89 frames. ], batch size: 55, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:08:09,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2779920.0, ans=0.125 2024-08-14 18:08:12,641 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 18:08:25,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2780020.0, ans=0.125 2024-08-14 18:08:44,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-14 18:09:20,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2780320.0, ans=0.125 2024-08-14 18:09:29,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2700, loss[loss=0.1123, beats_loss=0.01036, ecapa_loss=0.0001384, whisper_loss=0.1006, over 19164.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001533, whisper_loss=0.09034, over 3807606.47 frames. ], batch size: 74, lr: 3.15e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:09:44,700 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 18:09:46,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2780520.0, ans=0.1 2024-08-14 18:09:54,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2780520.0, ans=0.125 2024-08-14 18:10:01,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-14 18:10:03,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.336e+01 2.550e+01 2.927e+01 5.134e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-14 18:10:04,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-14 18:10:12,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:10:13,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2780620.0, ans=0.025 2024-08-14 18:10:18,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2780720.0, ans=0.2 2024-08-14 18:10:20,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2024-08-14 18:10:26,325 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 18:10:52,257 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2750, loss[loss=0.09307, beats_loss=0.009511, ecapa_loss=0.0001531, whisper_loss=0.08203, over 17987.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001526, whisper_loss=0.09033, over 3829741.22 frames. ], batch size: 68, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:10:52,468 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 18:11:01,640 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 18:11:08,396 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-14 18:11:16,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2781020.0, ans=0.125 2024-08-14 18:11:30,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2781120.0, ans=0.125 2024-08-14 18:11:32,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2024-08-14 18:11:46,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2781220.0, ans=0.2 2024-08-14 18:11:47,441 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 18:11:52,623 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-14 18:11:58,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2781320.0, ans=0.1 2024-08-14 18:12:11,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2781320.0, ans=0.05 2024-08-14 18:12:15,337 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.461e-02 2024-08-14 18:12:16,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2800, loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001835, whisper_loss=0.09, over 20769.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001524, whisper_loss=0.08994, over 3843859.30 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:12:24,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2781420.0, ans=0.125 2024-08-14 18:12:25,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781420.0, ans=0.1 2024-08-14 18:12:29,531 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 18:12:33,290 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-14 18:12:44,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2781520.0, ans=0.125 2024-08-14 18:12:48,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.377e+01 2.677e+01 2.938e+01 4.458e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-14 18:12:59,207 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-14 18:13:05,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2781720.0, ans=0.1 2024-08-14 18:13:07,509 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-14 18:13:30,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2781820.0, ans=0.125 2024-08-14 18:13:33,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2850, loss[loss=0.08235, beats_loss=0.01296, ecapa_loss=0.0001241, whisper_loss=0.06815, over 19445.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001528, whisper_loss=0.08994, over 3826497.16 frames. ], batch size: 78, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:13:38,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2781920.0, ans=0.09899494936611666 2024-08-14 18:13:47,875 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 18:13:49,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2782020.0, ans=0.2 2024-08-14 18:13:51,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2782020.0, ans=22.5 2024-08-14 18:14:36,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2782320.0, ans=0.125 2024-08-14 18:14:41,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2782320.0, ans=0.1 2024-08-14 18:14:48,074 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2900, loss[loss=0.1266, beats_loss=0.009633, ecapa_loss=0.0001378, whisper_loss=0.1156, over 24175.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001526, whisper_loss=0.08996, over 3849127.71 frames. ], batch size: 92, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:14:49,464 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 18:15:01,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2782520.0, ans=0.0 2024-08-14 18:15:15,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2782520.0, ans=0.125 2024-08-14 18:15:18,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.302e+01 2.501e+01 2.806e+01 3.501e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 18:15:38,174 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 18:15:39,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-14 18:16:00,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2782820.0, ans=0.125 2024-08-14 18:16:03,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 2950, loss[loss=0.116, beats_loss=0.01237, ecapa_loss=0.0001207, whisper_loss=0.1024, over 14194.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01076, ecapa_loss=0.0001526, whisper_loss=0.08945, over 3843630.52 frames. ], batch size: 54, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:16:13,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2782920.0, ans=0.125 2024-08-14 18:16:19,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2783020.0, ans=0.125 2024-08-14 18:16:41,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2783120.0, ans=0.0 2024-08-14 18:16:54,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2783220.0, ans=0.2 2024-08-14 18:17:18,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3000, loss[loss=0.1003, beats_loss=0.01254, ecapa_loss=0.0001678, whisper_loss=0.0861, over 22331.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001528, whisper_loss=0.09014, over 3826618.21 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:17:18,101 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 18:17:58,808 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on ASR_libri: loss=0.2511, beats_loss=0, ecapa_loss=0.0005401, whisper_loss=0.2457, over 922467.00 frames. 2024-08-14 18:18:19,602 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on SV_voxceleb1: loss=0.004329, beats_loss=0, ecapa_loss=0.0004329, whisper_loss=0, over 939242.00 frames. 2024-08-14 18:20:16,620 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on AT_audioset: loss=0.02338, beats_loss=0.02338, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 18:20:16,623 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 18:20:17,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2783420.0, ans=0.125 2024-08-14 18:20:44,562 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 18:20:48,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.403e+01 2.631e+01 2.938e+01 2.975e+02, threshold=5.261e+01, percent-clipped=1.0 2024-08-14 18:21:00,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-14 18:21:08,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=12.0 2024-08-14 18:21:23,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.28 vs. limit=15.0 2024-08-14 18:21:30,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3050, loss[loss=0.1475, beats_loss=0.00749, ecapa_loss=0.0001583, whisper_loss=0.1384, over 24357.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001526, whisper_loss=0.09031, over 3817024.41 frames. ], batch size: 92, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:21:32,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2783920.0, ans=0.125 2024-08-14 18:21:37,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-14 18:21:45,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2024-08-14 18:22:07,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784120.0, ans=0.1 2024-08-14 18:22:08,025 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:22:21,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784220.0, ans=0.1 2024-08-14 18:22:32,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2784320.0, ans=0.05 2024-08-14 18:22:46,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3100, loss[loss=0.09777, beats_loss=0.01088, ecapa_loss=0.0001729, whisper_loss=0.08516, over 21787.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.000154, whisper_loss=0.09005, over 3822769.58 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:22:48,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2784420.0, ans=0.125 2024-08-14 18:22:51,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2784420.0, ans=0.0 2024-08-14 18:22:55,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=12.0 2024-08-14 18:23:16,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.365e+01 2.545e+01 2.848e+01 4.706e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-14 18:23:29,376 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 18:23:29,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=2784720.0, ans=0.2 2024-08-14 18:23:32,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2784720.0, ans=0.1 2024-08-14 18:23:39,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784720.0, ans=0.1 2024-08-14 18:23:46,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2784820.0, ans=0.125 2024-08-14 18:23:56,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3150, loss[loss=0.08956, beats_loss=0.0121, ecapa_loss=0.0001392, whisper_loss=0.07607, over 18943.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001557, whisper_loss=0.09022, over 3824108.77 frames. ], batch size: 75, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:23:59,826 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 14 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-14 18:24:03,844 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 18:24:04,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2784920.0, ans=0.125 2024-08-14 18:24:09,281 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 18:24:27,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2785120.0, ans=0.125 2024-08-14 18:24:32,436 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-14 18:24:33,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2785120.0, ans=0.125 2024-08-14 18:24:57,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2785320.0, ans=0.2 2024-08-14 18:24:59,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2785320.0, ans=0.125 2024-08-14 18:25:06,074 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3200, loss[loss=0.09034, beats_loss=0.01281, ecapa_loss=0.0001599, whisper_loss=0.07593, over 17947.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001559, whisper_loss=0.09044, over 3801694.54 frames. ], batch size: 74, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:25:13,118 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 18:25:13,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2785420.0, ans=0.0 2024-08-14 18:25:14,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2785420.0, ans=0.1 2024-08-14 18:25:20,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2785520.0, ans=0.1 2024-08-14 18:25:26,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-14 18:25:35,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.274e+01 2.528e+01 2.834e+01 7.598e+01, threshold=5.056e+01, percent-clipped=2.0 2024-08-14 18:25:46,687 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 18:25:50,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-14 18:25:53,469 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 18:25:58,593 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-14 18:26:15,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3250, loss[loss=0.1195, beats_loss=0.007881, ecapa_loss=0.0002252, whisper_loss=0.1094, over 16720.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01062, ecapa_loss=0.0001563, whisper_loss=0.0909, over 3809133.16 frames. ], batch size: 72, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:26:36,999 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 18:26:37,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2786020.0, ans=0.125 2024-08-14 18:27:20,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2786320.0, ans=0.0 2024-08-14 18:27:20,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2786320.0, ans=0.1 2024-08-14 18:27:22,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3300, loss[loss=0.1059, beats_loss=0.01156, ecapa_loss=0.0001913, whisper_loss=0.09243, over 22154.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001561, whisper_loss=0.0907, over 3839525.42 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:27:25,592 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 18:27:51,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.316e+01 2.463e+01 2.771e+01 4.814e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-14 18:27:56,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2786620.0, ans=0.125 2024-08-14 18:27:56,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2786620.0, ans=0.04949747468305833 2024-08-14 18:28:05,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2786720.0, ans=10.0 2024-08-14 18:28:19,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2786820.0, ans=0.125 2024-08-14 18:28:25,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2786820.0, ans=0.125 2024-08-14 18:28:30,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3350, loss[loss=0.09814, beats_loss=0.008783, ecapa_loss=0.0001608, whisper_loss=0.08775, over 23184.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001552, whisper_loss=0.09066, over 3856286.59 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:28:37,531 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 18:28:43,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2787020.0, ans=0.1 2024-08-14 18:28:44,470 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 18:29:03,722 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 18:29:07,828 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 18:29:15,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2787220.0, ans=0.2 2024-08-14 18:29:21,945 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-14 18:29:26,060 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 18:29:28,829 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 18:29:30,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2787320.0, ans=0.05 2024-08-14 18:29:33,129 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 18:29:39,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3400, loss[loss=0.1008, beats_loss=0.01178, ecapa_loss=0.0001746, whisper_loss=0.08726, over 21924.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.000155, whisper_loss=0.09101, over 3882496.73 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:29:40,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2787420.0, ans=0.0 2024-08-14 18:29:46,689 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-14 18:29:49,273 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 18:30:07,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.360e+01 2.659e+01 3.040e+01 2.409e+02, threshold=5.318e+01, percent-clipped=1.0 2024-08-14 18:30:38,797 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 18:30:39,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2787820.0, ans=0.0 2024-08-14 18:30:42,812 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 18:30:48,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3450, loss[loss=0.1319, beats_loss=0.008735, ecapa_loss=0.000135, whisper_loss=0.1219, over 16269.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001547, whisper_loss=0.09033, over 3864137.37 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:31:09,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2788020.0, ans=0.035 2024-08-14 18:31:13,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2788120.0, ans=0.0 2024-08-14 18:31:24,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2788120.0, ans=0.0 2024-08-14 18:31:33,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788220.0, ans=0.1 2024-08-14 18:31:47,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2788320.0, ans=0.0 2024-08-14 18:31:55,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3500, loss[loss=0.1104, beats_loss=0.009668, ecapa_loss=0.0001583, whisper_loss=0.09917, over 16648.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01065, ecapa_loss=0.0001551, whisper_loss=0.09097, over 3881710.77 frames. ], batch size: 64, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:32:03,440 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-14 18:32:12,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788520.0, ans=0.1 2024-08-14 18:32:15,148 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 18:32:23,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.371e+01 2.585e+01 2.886e+01 6.376e+01, threshold=5.170e+01, percent-clipped=1.0 2024-08-14 18:32:25,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2788620.0, ans=0.0 2024-08-14 18:32:29,186 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-14 18:32:31,835 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-14 18:32:48,415 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 26 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-14 18:32:56,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2788820.0, ans=0.1 2024-08-14 18:33:01,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2024-08-14 18:33:03,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3550, loss[loss=0.09299, beats_loss=0.01315, ecapa_loss=0.0001321, whisper_loss=0.07852, over 21234.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09171, over 3913028.61 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:33:05,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2788920.0, ans=0.125 2024-08-14 18:33:11,725 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 18:33:27,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2789020.0, ans=0.125 2024-08-14 18:33:39,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2789120.0, ans=0.2 2024-08-14 18:33:45,774 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 18:34:09,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2789320.0, ans=0.1 2024-08-14 18:34:10,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-08-14 18:34:11,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3600, loss[loss=0.1195, beats_loss=0.008283, ecapa_loss=0.0001723, whisper_loss=0.1095, over 17149.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01063, ecapa_loss=0.0001541, whisper_loss=0.09215, over 3884346.16 frames. ], batch size: 69, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:34:12,829 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 18:34:14,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2789420.0, ans=0.125 2024-08-14 18:34:16,023 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:34:25,144 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 18:34:30,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2789520.0, ans=0.125 2024-08-14 18:34:39,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.323e+01 2.540e+01 2.892e+01 4.287e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-14 18:34:53,668 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-14 18:34:54,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.63 vs. limit=22.5 2024-08-14 18:34:56,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2789720.0, ans=0.125 2024-08-14 18:35:05,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2789820.0, ans=0.0 2024-08-14 18:35:09,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2789820.0, ans=0.1 2024-08-14 18:35:13,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2789820.0, ans=0.125 2024-08-14 18:35:16,872 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 18:35:19,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3650, loss[loss=0.08046, beats_loss=0.01356, ecapa_loss=0.0001372, whisper_loss=0.06553, over 15219.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01065, ecapa_loss=0.0001542, whisper_loss=0.09179, over 3889732.67 frames. ], batch size: 65, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:35:23,265 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 18:35:32,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2790020.0, ans=0.2 2024-08-14 18:36:07,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2024-08-14 18:36:18,692 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 18:36:22,381 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 18:36:25,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2790420.0, ans=0.1 2024-08-14 18:36:26,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3700, loss[loss=0.102, beats_loss=0.009701, ecapa_loss=0.0001939, whisper_loss=0.09041, over 21217.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001541, whisper_loss=0.09157, over 3894605.70 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:36:34,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2790420.0, ans=0.2 2024-08-14 18:36:41,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2790520.0, ans=0.125 2024-08-14 18:36:54,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.282e+01 2.542e+01 2.895e+01 4.405e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-14 18:37:02,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2790620.0, ans=15.0 2024-08-14 18:37:21,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2024-08-14 18:37:23,005 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 18:37:33,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3750, loss[loss=0.08346, beats_loss=0.01482, ecapa_loss=0.0001623, whisper_loss=0.06702, over 21121.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01065, ecapa_loss=0.0001545, whisper_loss=0.09191, over 3893038.56 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:37:34,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-14 18:37:47,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2791020.0, ans=0.2 2024-08-14 18:38:30,729 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-14 18:38:41,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3800, loss[loss=0.0932, beats_loss=0.01059, ecapa_loss=0.0001383, whisper_loss=0.08123, over 20465.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01062, ecapa_loss=0.0001558, whisper_loss=0.09204, over 3890255.75 frames. ], batch size: 80, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:38:43,000 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-14 18:38:51,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2791420.0, ans=0.125 2024-08-14 18:39:09,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.383e+01 2.672e+01 2.913e+01 4.805e+01, threshold=5.345e+01, percent-clipped=0.0 2024-08-14 18:39:16,412 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 18:39:19,210 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 18:39:40,986 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:39:44,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-14 18:39:46,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-14 18:39:48,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3850, loss[loss=0.09636, beats_loss=0.009176, ecapa_loss=0.0001919, whisper_loss=0.08527, over 16419.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01069, ecapa_loss=0.000154, whisper_loss=0.09128, over 3874306.88 frames. ], batch size: 68, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:40:01,557 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 18:40:06,763 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 18:40:08,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2792020.0, ans=0.125 2024-08-14 18:40:16,933 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:40:33,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-14 18:40:37,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2024-08-14 18:40:39,847 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-14 18:40:49,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2792320.0, ans=0.1 2024-08-14 18:40:49,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2792320.0, ans=0.1 2024-08-14 18:40:49,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-14 18:40:55,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3900, loss[loss=0.08232, beats_loss=0.01073, ecapa_loss=0.0002028, whisper_loss=0.06956, over 17355.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001563, whisper_loss=0.09167, over 3906202.01 frames. ], batch size: 73, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:40:58,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-14 18:41:00,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2792420.0, ans=0.125 2024-08-14 18:41:03,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2792420.0, ans=0.0 2024-08-14 18:41:04,456 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 18:41:11,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-14 18:41:24,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.416e+01 2.719e+01 3.088e+01 3.540e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-14 18:41:30,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-14 18:41:37,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2792720.0, ans=10.0 2024-08-14 18:41:37,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2792720.0, ans=0.125 2024-08-14 18:41:38,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2792720.0, ans=0.125 2024-08-14 18:42:03,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 3950, loss[loss=0.09843, beats_loss=0.01112, ecapa_loss=0.0001566, whisper_loss=0.08574, over 22115.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01059, ecapa_loss=0.0001566, whisper_loss=0.09225, over 3919014.55 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:42:27,923 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-14 18:42:33,307 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 18:42:36,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2793120.0, ans=0.2 2024-08-14 18:42:50,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2793220.0, ans=0.125 2024-08-14 18:42:56,551 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 18:42:58,989 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 18:43:09,388 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 18:43:10,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4000, loss[loss=0.1003, beats_loss=0.01312, ecapa_loss=0.0001871, whisper_loss=0.08527, over 15847.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01055, ecapa_loss=0.0001572, whisper_loss=0.09218, over 3892498.49 frames. ], batch size: 69, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:43:13,284 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 18:43:18,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2024-08-14 18:43:23,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2793520.0, ans=0.125 2024-08-14 18:43:31,471 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-14 18:43:32,883 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-14 18:43:34,259 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 18:43:34,776 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.106e+01 2024-08-14 18:43:39,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.386e+01 2.659e+01 3.102e+01 4.594e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-14 18:43:48,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2793620.0, ans=0.125 2024-08-14 18:43:53,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-14 18:43:57,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2024-08-14 18:44:19,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4050, loss[loss=0.09811, beats_loss=0.01113, ecapa_loss=0.0001427, whisper_loss=0.08556, over 20560.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01049, ecapa_loss=0.0001583, whisper_loss=0.09286, over 3912888.22 frames. ], batch size: 80, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:44:26,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2793920.0, ans=0.125 2024-08-14 18:44:44,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2794020.0, ans=0.125 2024-08-14 18:44:45,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.36 vs. limit=10.0 2024-08-14 18:44:51,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2794120.0, ans=0.0 2024-08-14 18:44:54,885 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-14 18:45:04,394 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-14 18:45:05,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2794220.0, ans=0.125 2024-08-14 18:45:14,097 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 18:45:14,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2794320.0, ans=0.125 2024-08-14 18:45:26,335 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-14 18:45:27,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4100, loss[loss=0.1115, beats_loss=0.01157, ecapa_loss=0.0001384, whisper_loss=0.09858, over 19611.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01063, ecapa_loss=0.0001574, whisper_loss=0.09231, over 3921964.79 frames. ], batch size: 78, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:45:35,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2794420.0, ans=0.125 2024-08-14 18:45:49,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2794520.0, ans=0.0 2024-08-14 18:45:57,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.354e+01 2.603e+01 2.918e+01 6.130e+01, threshold=5.207e+01, percent-clipped=1.0 2024-08-14 18:46:01,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2794620.0, ans=0.125 2024-08-14 18:46:08,002 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 18:46:08,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-14 18:46:09,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794720.0, ans=0.1 2024-08-14 18:46:20,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2794720.0, ans=10.0 2024-08-14 18:46:22,767 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 18:46:25,454 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 18:46:25,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794820.0, ans=0.1 2024-08-14 18:46:29,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2794820.0, ans=0.0 2024-08-14 18:46:36,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4150, loss[loss=0.09873, beats_loss=0.0119, ecapa_loss=0.0001578, whisper_loss=0.08525, over 22183.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01071, ecapa_loss=0.0001572, whisper_loss=0.09161, over 3929289.32 frames. ], batch size: 93, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:46:42,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.57 vs. limit=6.0 2024-08-14 18:47:03,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2795120.0, ans=0.04949747468305833 2024-08-14 18:47:11,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2795120.0, ans=0.125 2024-08-14 18:47:19,710 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-14 18:47:25,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2795220.0, ans=0.125 2024-08-14 18:47:44,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4200, loss[loss=0.09227, beats_loss=0.01089, ecapa_loss=0.0001384, whisper_loss=0.07999, over 20890.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01071, ecapa_loss=0.0001564, whisper_loss=0.09134, over 3910024.79 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:47:45,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2795420.0, ans=10.0 2024-08-14 18:47:50,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2024-08-14 18:47:54,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2795420.0, ans=0.125 2024-08-14 18:47:54,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795420.0, ans=0.1 2024-08-14 18:47:59,158 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 40 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 18:48:03,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2795520.0, ans=0.0 2024-08-14 18:48:04,465 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 18:48:09,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2795520.0, ans=0.2 2024-08-14 18:48:12,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.416e+01 2.672e+01 2.930e+01 6.892e+01, threshold=5.345e+01, percent-clipped=1.0 2024-08-14 18:48:15,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2795620.0, ans=0.125 2024-08-14 18:48:19,645 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-14 18:48:22,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-08-14 18:48:35,773 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 18:48:38,648 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 18:48:52,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4250, loss[loss=0.1033, beats_loss=0.009364, ecapa_loss=0.0001704, whisper_loss=0.09222, over 18313.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01069, ecapa_loss=0.0001556, whisper_loss=0.09183, over 3919432.99 frames. ], batch size: 76, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:48:52,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2795920.0, ans=0.0 2024-08-14 18:49:00,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795920.0, ans=0.1 2024-08-14 18:49:11,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-08-14 18:49:14,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2796020.0, ans=0.125 2024-08-14 18:49:25,389 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 36 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 18:49:27,997 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 18:49:37,835 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 18:49:41,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-14 18:49:47,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-08-14 18:49:52,117 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-14 18:50:02,755 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4300, loss[loss=0.1128, beats_loss=0.007582, ecapa_loss=0.0001609, whisper_loss=0.1036, over 14631.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01068, ecapa_loss=0.0001558, whisper_loss=0.09161, over 3897970.46 frames. ], batch size: 57, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:50:06,257 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 40 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 18:50:10,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2796420.0, ans=0.0 2024-08-14 18:50:13,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2796420.0, ans=0.0 2024-08-14 18:50:19,893 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 18:50:34,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.393e+01 2.675e+01 3.079e+01 4.317e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-14 18:50:55,652 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:50:59,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2796720.0, ans=0.09899494936611666 2024-08-14 18:51:11,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-14 18:51:18,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4350, loss[loss=0.09561, beats_loss=0.01078, ecapa_loss=0.0001557, whisper_loss=0.08327, over 21414.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001556, whisper_loss=0.09082, over 3916016.05 frames. ], batch size: 85, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:51:26,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2796920.0, ans=0.1 2024-08-14 18:51:30,479 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 18:51:43,956 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 18:51:48,518 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 18:51:56,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2024-08-14 18:52:09,908 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 18:52:14,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2797220.0, ans=0.125 2024-08-14 18:52:19,794 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-14 18:52:22,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2797320.0, ans=0.0 2024-08-14 18:52:25,722 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 18:52:32,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4400, loss[loss=0.1121, beats_loss=0.009878, ecapa_loss=0.0001785, whisper_loss=0.1005, over 23144.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001551, whisper_loss=0.09111, over 3943691.02 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 1.152921504606847e+18 2024-08-14 18:52:37,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-08-14 18:52:46,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797420.0, ans=0.1 2024-08-14 18:52:55,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2797520.0, ans=10.0 2024-08-14 18:52:56,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2797520.0, ans=0.125 2024-08-14 18:53:02,394 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 18:53:04,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.379e+01 2.659e+01 2.952e+01 7.187e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-14 18:53:23,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2797720.0, ans=0.2 2024-08-14 18:53:29,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2797720.0, ans=0.125 2024-08-14 18:53:48,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4450, loss[loss=0.09631, beats_loss=0.01086, ecapa_loss=0.0001629, whisper_loss=0.08383, over 18795.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001536, whisper_loss=0.09054, over 3928812.56 frames. ], batch size: 73, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:53:55,208 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 18:54:13,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-08-14 18:54:19,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2798120.0, ans=0.0 2024-08-14 18:54:22,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2798120.0, ans=0.2 2024-08-14 18:54:23,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2798120.0, ans=0.025 2024-08-14 18:54:31,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2798120.0, ans=0.0 2024-08-14 18:54:44,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2798220.0, ans=0.1 2024-08-14 18:54:48,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2798220.0, ans=0.0 2024-08-14 18:54:50,493 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 18:55:06,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4500, loss[loss=0.109, beats_loss=0.00983, ecapa_loss=0.0001293, whisper_loss=0.0979, over 17858.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001537, whisper_loss=0.09021, over 3917604.60 frames. ], batch size: 68, lr: 3.14e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:55:38,122 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-14 18:55:38,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-14 18:55:40,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2798620.0, ans=0.125 2024-08-14 18:55:42,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-14 18:55:42,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.288e+01 2.644e+01 2.918e+01 3.847e+02, threshold=5.287e+01, percent-clipped=3.0 2024-08-14 18:55:44,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2798620.0, ans=0.125 2024-08-14 18:56:26,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4550, loss[loss=0.1071, beats_loss=0.009576, ecapa_loss=0.0001592, whisper_loss=0.09592, over 23050.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001536, whisper_loss=0.09005, over 3938982.44 frames. ], batch size: 90, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:56:27,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2798920.0, ans=0.1 2024-08-14 18:56:31,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2798920.0, ans=0.125 2024-08-14 18:56:44,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799020.0, ans=0.1 2024-08-14 18:56:50,421 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-14 18:56:50,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2799020.0, ans=0.5 2024-08-14 18:56:53,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2799020.0, ans=0.125 2024-08-14 18:57:14,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2799220.0, ans=0.1 2024-08-14 18:57:19,941 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 18:57:39,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799320.0, ans=0.1 2024-08-14 18:57:43,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4600, loss[loss=0.1169, beats_loss=0.009754, ecapa_loss=0.0001375, whisper_loss=0.1057, over 23994.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01084, ecapa_loss=0.0001534, whisper_loss=0.08898, over 3900858.44 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:57:45,123 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 18:57:46,635 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-14 18:58:00,374 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 18:58:01,597 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-14 18:58:01,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2799520.0, ans=0.0 2024-08-14 18:58:03,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2799520.0, ans=0.2 2024-08-14 18:58:15,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.391e+01 2.667e+01 2.855e+01 4.020e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-14 18:58:21,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2799620.0, ans=0.125 2024-08-14 18:58:58,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4650, loss[loss=0.1238, beats_loss=0.00809, ecapa_loss=0.0001319, whisper_loss=0.1144, over 19949.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01075, ecapa_loss=0.000154, whisper_loss=0.08977, over 3893200.21 frames. ], batch size: 73, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 18:59:03,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2799920.0, ans=0.125 2024-08-14 18:59:13,872 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-14 18:59:23,099 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 18:59:32,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=22.5 2024-08-14 18:59:33,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2800120.0, ans=0.125 2024-08-14 18:59:40,489 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 18:59:40,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2024-08-14 18:59:53,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800220.0, ans=0.1 2024-08-14 19:00:17,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4700, loss[loss=0.08251, beats_loss=0.01233, ecapa_loss=0.0001193, whisper_loss=0.06898, over 17321.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001532, whisper_loss=0.08984, over 3883850.78 frames. ], batch size: 67, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:00:32,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2800520.0, ans=0.125 2024-08-14 19:00:50,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.338e+01 2.588e+01 2.905e+01 3.899e+01, threshold=5.177e+01, percent-clipped=0.0 2024-08-14 19:00:57,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2800620.0, ans=0.0 2024-08-14 19:01:17,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2800820.0, ans=0.09899494936611666 2024-08-14 19:01:17,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2800820.0, ans=0.125 2024-08-14 19:01:25,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2800820.0, ans=0.125 2024-08-14 19:01:28,892 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 19:01:33,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4750, loss[loss=0.1213, beats_loss=0.00827, ecapa_loss=0.0001502, whisper_loss=0.1116, over 18730.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001541, whisper_loss=0.08989, over 3871956.12 frames. ], batch size: 70, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:01:40,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2800920.0, ans=0.125 2024-08-14 19:01:43,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-14 19:01:46,355 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 19:01:50,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2801020.0, ans=0.125 2024-08-14 19:01:51,259 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 19:02:08,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-14 19:02:10,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2801120.0, ans=0.07 2024-08-14 19:02:17,753 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:02:26,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2801220.0, ans=0.125 2024-08-14 19:02:37,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2801320.0, ans=0.125 2024-08-14 19:02:37,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=12.0 2024-08-14 19:02:43,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2801320.0, ans=0.0 2024-08-14 19:02:47,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4800, loss[loss=0.1009, beats_loss=0.01206, ecapa_loss=0.0001294, whisper_loss=0.08752, over 22166.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.000155, whisper_loss=0.09045, over 3888283.59 frames. ], batch size: 86, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:02:51,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2024-08-14 19:02:59,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-14 19:03:03,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2801520.0, ans=0.0 2024-08-14 19:03:13,447 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 19:03:18,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2801620.0, ans=0.0 2024-08-14 19:03:20,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.344e+01 2.546e+01 2.876e+01 4.578e+02, threshold=5.092e+01, percent-clipped=1.0 2024-08-14 19:03:21,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2801620.0, ans=0.1 2024-08-14 19:03:29,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801620.0, ans=0.1 2024-08-14 19:03:56,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2801820.0, ans=0.07 2024-08-14 19:04:01,154 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4850, loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001401, whisper_loss=0.09281, over 15580.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001542, whisper_loss=0.08995, over 3911338.81 frames. ], batch size: 60, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:04:04,054 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-14 19:04:27,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-14 19:04:41,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2802120.0, ans=0.2 2024-08-14 19:04:53,006 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 19:04:54,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-14 19:05:17,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4900, loss[loss=0.0914, beats_loss=0.009736, ecapa_loss=0.0001718, whisper_loss=0.07994, over 16707.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01067, ecapa_loss=0.0001546, whisper_loss=0.09003, over 3880261.45 frames. ], batch size: 66, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:05:19,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2802420.0, ans=0.2 2024-08-14 19:05:41,950 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 19:05:52,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.355e+01 2.636e+01 2.883e+01 6.029e+01, threshold=5.271e+01, percent-clipped=1.0 2024-08-14 19:06:06,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2802720.0, ans=0.125 2024-08-14 19:06:38,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 4950, loss[loss=0.103, beats_loss=0.009577, ecapa_loss=0.000175, whisper_loss=0.09165, over 18946.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.000156, whisper_loss=0.0901, over 3856368.20 frames. ], batch size: 77, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:06:39,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2802920.0, ans=22.5 2024-08-14 19:06:56,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2803020.0, ans=0.0 2024-08-14 19:07:10,693 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 19:07:15,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2803120.0, ans=0.07 2024-08-14 19:07:29,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.59 vs. limit=22.5 2024-08-14 19:07:54,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5000, loss[loss=0.08796, beats_loss=0.009729, ecapa_loss=0.0002124, whisper_loss=0.07611, over 14761.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001559, whisper_loss=0.09039, over 3854824.81 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:08:01,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2803420.0, ans=0.09899494936611666 2024-08-14 19:08:06,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2803420.0, ans=0.0 2024-08-14 19:08:16,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2803520.0, ans=0.07 2024-08-14 19:08:22,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2803620.0, ans=0.0 2024-08-14 19:08:25,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.350e+01 2.620e+01 2.995e+01 1.741e+02, threshold=5.241e+01, percent-clipped=2.0 2024-08-14 19:08:29,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2803620.0, ans=0.1 2024-08-14 19:08:46,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2803720.0, ans=0.2 2024-08-14 19:08:49,698 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 19:08:51,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=12.0 2024-08-14 19:08:55,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-14 19:08:59,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2803820.0, ans=0.125 2024-08-14 19:09:05,118 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 19:09:06,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5050, loss[loss=0.1244, beats_loss=0.008705, ecapa_loss=0.0001519, whisper_loss=0.1142, over 19630.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001559, whisper_loss=0.09038, over 3869875.97 frames. ], batch size: 77, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:09:08,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2803920.0, ans=0.125 2024-08-14 19:09:10,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-14 19:09:14,244 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 19:09:21,424 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-14 19:09:36,169 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 19:09:53,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2804220.0, ans=0.2 2024-08-14 19:10:03,244 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:10:15,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2804320.0, ans=0.0 2024-08-14 19:10:21,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5100, loss[loss=0.1103, beats_loss=0.01003, ecapa_loss=0.00015, whisper_loss=0.09872, over 14642.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001564, whisper_loss=0.09088, over 3844737.09 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:10:22,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804420.0, ans=0.1 2024-08-14 19:10:26,943 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 19:10:31,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=2804420.0, ans=12.0 2024-08-14 19:10:36,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-14 19:10:56,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.368e+01 2.597e+01 2.934e+01 4.134e+01, threshold=5.194e+01, percent-clipped=0.0 2024-08-14 19:10:59,609 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.578e-02 2024-08-14 19:11:07,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804620.0, ans=0.1 2024-08-14 19:11:16,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2804720.0, ans=0.125 2024-08-14 19:11:40,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5150, loss[loss=0.1004, beats_loss=0.01202, ecapa_loss=0.0001834, whisper_loss=0.08651, over 21510.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001551, whisper_loss=0.09059, over 3878851.59 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:11:45,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2804920.0, ans=0.0 2024-08-14 19:11:47,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2804920.0, ans=0.125 2024-08-14 19:11:50,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2804920.0, ans=0.04949747468305833 2024-08-14 19:12:11,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2805120.0, ans=0.125 2024-08-14 19:12:19,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2805120.0, ans=0.0 2024-08-14 19:12:39,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2805320.0, ans=0.05 2024-08-14 19:12:45,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2805320.0, ans=0.125 2024-08-14 19:12:50,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2805320.0, ans=0.125 2024-08-14 19:12:54,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5200, loss[loss=0.08414, beats_loss=0.01324, ecapa_loss=0.0001251, whisper_loss=0.06965, over 23278.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001541, whisper_loss=0.09095, over 3844315.82 frames. ], batch size: 95, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:12:56,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2805420.0, ans=0.0 2024-08-14 19:12:59,696 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 19:12:59,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2805420.0, ans=0.125 2024-08-14 19:13:03,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2805420.0, ans=0.0 2024-08-14 19:13:09,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2805520.0, ans=0.125 2024-08-14 19:13:17,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2805520.0, ans=0.125 2024-08-14 19:13:19,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2805520.0, ans=0.125 2024-08-14 19:13:28,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.358e+01 2.582e+01 2.808e+01 4.877e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 19:13:30,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2805620.0, ans=0.0 2024-08-14 19:13:38,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-14 19:13:46,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2805720.0, ans=0.125 2024-08-14 19:13:53,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2024-08-14 19:13:55,867 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 19:14:00,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=15.0 2024-08-14 19:14:07,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2805820.0, ans=0.125 2024-08-14 19:14:10,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5250, loss[loss=0.09766, beats_loss=0.01187, ecapa_loss=0.0001333, whisper_loss=0.08445, over 19524.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001533, whisper_loss=0.09053, over 3829170.66 frames. ], batch size: 79, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:14:19,645 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 19:14:44,605 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-14 19:15:27,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5300, loss[loss=0.1186, beats_loss=0.009756, ecapa_loss=0.0001622, whisper_loss=0.1073, over 22934.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.0001534, whisper_loss=0.09042, over 3828640.21 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:15:34,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2806420.0, ans=0.125 2024-08-14 19:15:37,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2806420.0, ans=0.0 2024-08-14 19:15:41,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-14 19:15:58,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-14 19:16:02,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.456e+01 2.845e+01 4.034e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-14 19:16:04,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2806620.0, ans=0.125 2024-08-14 19:16:42,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.47 vs. limit=10.0 2024-08-14 19:16:45,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5350, loss[loss=0.0879, beats_loss=0.01227, ecapa_loss=0.0001337, whisper_loss=0.07429, over 14232.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001532, whisper_loss=0.09092, over 3822875.27 frames. ], batch size: 55, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:16:51,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-14 19:17:07,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2807020.0, ans=0.0 2024-08-14 19:17:12,979 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 19:17:16,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-14 19:17:17,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2807020.0, ans=0.015 2024-08-14 19:17:24,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2807120.0, ans=0.1 2024-08-14 19:17:28,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2807120.0, ans=0.5 2024-08-14 19:17:47,982 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-14 19:18:13,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5400, loss[loss=0.1095, beats_loss=0.01165, ecapa_loss=0.0001248, whisper_loss=0.09662, over 20487.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001543, whisper_loss=0.09124, over 3815750.11 frames. ], batch size: 77, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:18:49,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2807620.0, ans=0.0 2024-08-14 19:18:50,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.371e+01 2.761e+01 3.113e+01 5.866e+01, threshold=5.523e+01, percent-clipped=1.0 2024-08-14 19:19:05,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2807720.0, ans=0.0 2024-08-14 19:19:09,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2807720.0, ans=0.125 2024-08-14 19:19:16,079 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-14 19:19:16,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2807720.0, ans=0.0 2024-08-14 19:19:19,222 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 19:19:24,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2807820.0, ans=0.125 2024-08-14 19:19:24,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2807820.0, ans=0.07 2024-08-14 19:19:40,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2807820.0, ans=0.0 2024-08-14 19:19:43,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5450, loss[loss=0.1268, beats_loss=0.01123, ecapa_loss=0.0001563, whisper_loss=0.114, over 23636.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.000155, whisper_loss=0.09164, over 3855207.42 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:19:56,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-14 19:20:09,152 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-14 19:20:10,425 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-14 19:20:16,706 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 19:20:26,448 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 19:20:29,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808120.0, ans=0.125 2024-08-14 19:20:31,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2808120.0, ans=0.1 2024-08-14 19:20:39,883 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 19:20:41,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-14 19:20:49,160 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 19:20:53,563 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-14 19:21:02,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2808320.0, ans=0.0 2024-08-14 19:21:17,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2808320.0, ans=0.125 2024-08-14 19:21:23,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5500, loss[loss=0.114, beats_loss=0.009448, ecapa_loss=0.0001889, whisper_loss=0.1027, over 20355.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001554, whisper_loss=0.0914, over 3884650.10 frames. ], batch size: 80, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:21:41,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2808520.0, ans=0.0 2024-08-14 19:21:48,028 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 19:21:54,431 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-14 19:22:09,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.424e+01 2.779e+01 3.099e+01 3.330e+02, threshold=5.557e+01, percent-clipped=2.0 2024-08-14 19:22:11,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2808620.0, ans=0.125 2024-08-14 19:22:22,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2808620.0, ans=0.125 2024-08-14 19:22:22,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2808620.0, ans=0.125 2024-08-14 19:22:25,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808620.0, ans=0.1 2024-08-14 19:22:27,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2808720.0, ans=0.125 2024-08-14 19:22:27,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2808720.0, ans=0.2 2024-08-14 19:23:08,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808820.0, ans=0.125 2024-08-14 19:23:11,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5550, loss[loss=0.08548, beats_loss=0.009812, ecapa_loss=0.0001494, whisper_loss=0.07417, over 15474.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001553, whisper_loss=0.09156, over 3915200.20 frames. ], batch size: 59, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:23:21,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2808920.0, ans=0.125 2024-08-14 19:24:01,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-14 19:24:06,337 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 19:24:14,431 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 19:24:23,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2809220.0, ans=0.1 2024-08-14 19:24:29,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2024-08-14 19:24:43,616 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 19:24:52,705 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5600, loss[loss=0.07946, beats_loss=0.01462, ecapa_loss=0.0001161, whisper_loss=0.06368, over 15214.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001549, whisper_loss=0.09097, over 3921568.94 frames. ], batch size: 62, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:24:53,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2809420.0, ans=0.07 2024-08-14 19:25:08,881 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-14 19:25:10,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2809520.0, ans=0.125 2024-08-14 19:25:24,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.292e+01 2.694e+01 2.993e+01 3.874e+01, threshold=5.387e+01, percent-clipped=0.0 2024-08-14 19:25:25,873 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-14 19:25:27,479 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 19:25:28,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=2809620.0, ans=12.0 2024-08-14 19:25:31,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-14 19:25:47,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2809720.0, ans=0.125 2024-08-14 19:25:52,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2809820.0, ans=0.125 2024-08-14 19:26:04,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5650, loss[loss=0.1287, beats_loss=0.008615, ecapa_loss=0.0001296, whisper_loss=0.1188, over 21490.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001546, whisper_loss=0.091, over 3955443.31 frames. ], batch size: 76, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:26:26,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.251e-01 2024-08-14 19:26:39,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2810120.0, ans=0.125 2024-08-14 19:26:44,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2810120.0, ans=0.1 2024-08-14 19:27:11,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2810320.0, ans=0.125 2024-08-14 19:27:13,075 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-14 19:27:19,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5700, loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001669, whisper_loss=0.08916, over 20608.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.09066, over 3916910.19 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:27:29,145 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 19:27:30,496 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-14 19:27:43,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2810520.0, ans=0.125 2024-08-14 19:27:45,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2810520.0, ans=22.5 2024-08-14 19:27:51,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.307e+01 2.514e+01 2.816e+01 4.087e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-14 19:27:56,652 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-14 19:28:12,827 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-14 19:28:14,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2810720.0, ans=0.04949747468305833 2024-08-14 19:28:19,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-08-14 19:28:20,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2810820.0, ans=0.05 2024-08-14 19:28:32,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5750, loss[loss=0.09578, beats_loss=0.01081, ecapa_loss=0.000153, whisper_loss=0.08344, over 22845.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.0001553, whisper_loss=0.09091, over 3922634.29 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:28:43,875 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 13 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 19:28:47,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-08-14 19:29:01,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2811020.0, ans=0.125 2024-08-14 19:29:04,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=12.0 2024-08-14 19:29:09,037 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-14 19:29:27,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2811220.0, ans=0.125 2024-08-14 19:29:46,724 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 19:29:46,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2811320.0, ans=0.125 2024-08-14 19:29:49,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5800, loss[loss=0.1085, beats_loss=0.009947, ecapa_loss=0.0001342, whisper_loss=0.09725, over 17056.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01072, ecapa_loss=0.0001553, whisper_loss=0.08994, over 3881087.76 frames. ], batch size: 65, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:29:56,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2811420.0, ans=0.0 2024-08-14 19:30:02,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2811420.0, ans=0.0 2024-08-14 19:30:04,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2811520.0, ans=15.0 2024-08-14 19:30:22,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.246e+01 2.501e+01 2.765e+01 4.187e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-14 19:30:35,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2811720.0, ans=0.2 2024-08-14 19:30:42,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2811720.0, ans=0.1 2024-08-14 19:30:46,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2811720.0, ans=0.125 2024-08-14 19:30:49,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2811820.0, ans=0.125 2024-08-14 19:31:03,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5850, loss[loss=0.1116, beats_loss=0.0103, ecapa_loss=0.0001926, whisper_loss=0.09937, over 22165.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001564, whisper_loss=0.08993, over 3866067.84 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:31:03,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2811920.0, ans=0.0 2024-08-14 19:31:19,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2812020.0, ans=0.125 2024-08-14 19:31:21,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-14 19:31:45,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2812120.0, ans=15.0 2024-08-14 19:31:53,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2812220.0, ans=0.125 2024-08-14 19:32:00,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2812320.0, ans=0.0 2024-08-14 19:32:03,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2812320.0, ans=0.2 2024-08-14 19:32:16,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5900, loss[loss=0.08766, beats_loss=0.01194, ecapa_loss=0.0001567, whisper_loss=0.07415, over 17765.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001567, whisper_loss=0.09021, over 3852737.60 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:32:35,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2024-08-14 19:32:37,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.97 vs. limit=22.5 2024-08-14 19:32:42,081 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 19:32:43,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2812520.0, ans=0.125 2024-08-14 19:32:49,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.347e+01 2.667e+01 3.027e+01 4.357e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-14 19:33:02,204 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-14 19:33:04,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-08-14 19:33:22,588 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 31 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-14 19:33:22,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2812820.0, ans=0.125 2024-08-14 19:33:30,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 5950, loss[loss=0.09056, beats_loss=0.01137, ecapa_loss=0.0001575, whisper_loss=0.07761, over 17681.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001555, whisper_loss=0.09043, over 3884357.43 frames. ], batch size: 71, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:33:32,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2812920.0, ans=0.125 2024-08-14 19:33:39,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2812920.0, ans=0.125 2024-08-14 19:33:45,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2813020.0, ans=0.125 2024-08-14 19:34:02,933 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 19:34:45,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6000, loss[loss=0.09434, beats_loss=0.01279, ecapa_loss=0.000143, whisper_loss=0.08012, over 22971.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001546, whisper_loss=0.09042, over 3908446.10 frames. ], batch size: 93, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:34:45,148 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 19:35:23,110 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005442, whisper_loss=0.2472, over 922467.00 frames. 2024-08-14 19:35:42,565 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on SV_voxceleb1: loss=0.004201, beats_loss=0, ecapa_loss=0.0004201, whisper_loss=0, over 939242.00 frames. 2024-08-14 19:37:36,288 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on AT_audioset: loss=0.02345, beats_loss=0.02345, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 19:37:36,292 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 19:37:36,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813420.0, ans=0.1 2024-08-14 19:37:39,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813420.0, ans=0.1 2024-08-14 19:37:44,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2813420.0, ans=0.125 2024-08-14 19:38:07,771 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 19:38:10,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.295e+01 2.518e+01 2.791e+01 2.335e+02, threshold=5.037e+01, percent-clipped=2.0 2024-08-14 19:38:24,819 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-14 19:38:37,071 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 19:38:53,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6050, loss[loss=0.1151, beats_loss=0.01118, ecapa_loss=0.0001442, whisper_loss=0.1025, over 22000.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001543, whisper_loss=0.09043, over 3864302.05 frames. ], batch size: 85, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:38:59,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2813920.0, ans=0.125 2024-08-14 19:39:19,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2814020.0, ans=0.125 2024-08-14 19:39:21,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2814120.0, ans=0.0 2024-08-14 19:39:25,364 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-14 19:39:39,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2814220.0, ans=0.125 2024-08-14 19:39:40,354 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-14 19:39:43,087 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-14 19:39:50,814 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 19:39:54,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2814320.0, ans=0.1 2024-08-14 19:39:56,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2814320.0, ans=0.07 2024-08-14 19:40:00,866 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-14 19:40:03,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2814320.0, ans=0.0 2024-08-14 19:40:06,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6100, loss[loss=0.1009, beats_loss=0.01173, ecapa_loss=0.0001676, whisper_loss=0.08751, over 21883.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01079, ecapa_loss=0.0001534, whisper_loss=0.08964, over 3878934.22 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:40:17,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-14 19:40:18,502 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-14 19:40:18,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814420.0, ans=0.1 2024-08-14 19:40:20,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-14 19:40:27,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-14 19:40:28,996 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 19:40:38,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.270e+01 2.572e+01 2.867e+01 4.147e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-14 19:41:13,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2814820.0, ans=0.125 2024-08-14 19:41:19,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6150, loss[loss=0.1165, beats_loss=0.01105, ecapa_loss=0.0001263, whisper_loss=0.1042, over 15890.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0001536, whisper_loss=0.08975, over 3871624.60 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:41:30,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-14 19:41:43,234 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-14 19:41:49,610 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-14 19:42:21,834 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-14 19:42:32,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6200, loss[loss=0.1204, beats_loss=0.009685, ecapa_loss=0.0001779, whisper_loss=0.1089, over 23058.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.000154, whisper_loss=0.09077, over 3879161.33 frames. ], batch size: 92, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:42:41,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2815420.0, ans=0.125 2024-08-14 19:42:55,579 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-14 19:42:56,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2815520.0, ans=0.0 2024-08-14 19:42:58,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2815520.0, ans=0.2 2024-08-14 19:43:05,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-08-14 19:43:05,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.332e+01 2.614e+01 2.876e+01 4.461e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-14 19:43:12,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815620.0, ans=0.1 2024-08-14 19:43:30,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815720.0, ans=0.1 2024-08-14 19:43:32,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2815820.0, ans=0.125 2024-08-14 19:43:40,756 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-14 19:43:42,388 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-14 19:43:48,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6250, loss[loss=0.1122, beats_loss=0.01083, ecapa_loss=0.0001421, whisper_loss=0.09993, over 22265.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001546, whisper_loss=0.0907, over 3875457.53 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:44:00,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2815920.0, ans=0.125 2024-08-14 19:44:15,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2816020.0, ans=22.5 2024-08-14 19:44:42,330 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-14 19:45:00,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2816420.0, ans=0.125 2024-08-14 19:45:01,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6300, loss[loss=0.08604, beats_loss=0.01392, ecapa_loss=0.000162, whisper_loss=0.07051, over 17157.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001541, whisper_loss=0.0904, over 3855699.42 frames. ], batch size: 68, lr: 3.13e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:45:04,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2816420.0, ans=0.125 2024-08-14 19:45:33,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.242e+01 2.428e+01 2.656e+01 5.822e+01, threshold=4.856e+01, percent-clipped=1.0 2024-08-14 19:45:52,290 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 19:46:13,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6350, loss[loss=0.09958, beats_loss=0.01351, ecapa_loss=0.0001156, whisper_loss=0.08491, over 24172.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001545, whisper_loss=0.09043, over 3889732.90 frames. ], batch size: 96, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:46:16,735 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-14 19:46:26,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2817020.0, ans=0.025 2024-08-14 19:46:37,882 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 19:46:48,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2817120.0, ans=0.1 2024-08-14 19:46:51,103 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 19:46:53,943 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 19:47:00,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2817220.0, ans=0.125 2024-08-14 19:47:24,599 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 11 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 19:47:26,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2817320.0, ans=0.0 2024-08-14 19:47:28,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6400, loss[loss=0.1262, beats_loss=0.009992, ecapa_loss=0.0001605, whisper_loss=0.1146, over 24054.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01077, ecapa_loss=0.0001536, whisper_loss=0.0907, over 3884045.38 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:47:42,242 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 19:47:45,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=2817520.0, ans=0.1 2024-08-14 19:47:58,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2817620.0, ans=0.2 2024-08-14 19:47:58,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2817620.0, ans=0.1 2024-08-14 19:48:01,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.350e+01 2.618e+01 2.916e+01 9.868e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-14 19:48:19,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2817720.0, ans=0.0 2024-08-14 19:48:36,606 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-14 19:48:42,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6450, loss[loss=0.1106, beats_loss=0.009861, ecapa_loss=0.0001529, whisper_loss=0.0992, over 19949.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.0909, over 3863272.55 frames. ], batch size: 80, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:48:53,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2817920.0, ans=0.0 2024-08-14 19:49:01,817 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 19:49:21,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2818120.0, ans=0.125 2024-08-14 19:49:24,614 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-14 19:49:35,797 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.404e+00 2024-08-14 19:49:46,549 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-14 19:50:00,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6500, loss[loss=0.1188, beats_loss=0.01097, ecapa_loss=0.00012, whisper_loss=0.1066, over 18904.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001553, whisper_loss=0.09173, over 3874929.35 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:50:14,528 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 19:50:15,908 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 19:50:16,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-14 19:50:33,958 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-14 19:50:35,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.395e+01 2.629e+01 2.951e+01 4.669e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-14 19:50:51,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2818720.0, ans=0.2 2024-08-14 19:51:10,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2818820.0, ans=0.1 2024-08-14 19:51:16,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6550, loss[loss=0.1268, beats_loss=0.007409, ecapa_loss=0.0001474, whisper_loss=0.1179, over 19583.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01067, ecapa_loss=0.0001551, whisper_loss=0.09174, over 3876146.07 frames. ], batch size: 75, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:51:26,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2818920.0, ans=0.0 2024-08-14 19:51:31,027 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 17 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 19:51:43,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2819020.0, ans=0.125 2024-08-14 19:51:49,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2819120.0, ans=0.2 2024-08-14 19:51:51,662 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-14 19:52:05,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-14 19:52:09,924 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 19:52:18,367 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-14 19:52:23,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2819320.0, ans=0.0 2024-08-14 19:52:29,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-14 19:52:35,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2819420.0, ans=0.05 2024-08-14 19:52:36,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6600, loss[loss=0.1099, beats_loss=0.01111, ecapa_loss=0.0001392, whisper_loss=0.09741, over 23099.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001548, whisper_loss=0.09179, over 3907253.14 frames. ], batch size: 92, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:52:50,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-14 19:52:51,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2819520.0, ans=0.2 2024-08-14 19:52:51,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2819520.0, ans=0.0 2024-08-14 19:52:54,336 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 19:53:13,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.460e+01 2.689e+01 3.191e+01 5.178e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-14 19:53:19,113 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 19:53:45,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2819820.0, ans=0.125 2024-08-14 19:53:48,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=12.0 2024-08-14 19:53:55,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6650, loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.000182, whisper_loss=0.09008, over 21177.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01066, ecapa_loss=0.0001541, whisper_loss=0.09207, over 3919945.00 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:54:05,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2819920.0, ans=0.0 2024-08-14 19:54:13,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2820020.0, ans=0.09899494936611666 2024-08-14 19:54:33,981 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 19:54:53,272 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-14 19:54:59,731 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 19:55:15,207 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6700, loss[loss=0.08936, beats_loss=0.01321, ecapa_loss=0.0001176, whisper_loss=0.07497, over 22092.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01067, ecapa_loss=0.0001538, whisper_loss=0.09217, over 3926809.43 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:55:15,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2820420.0, ans=0.04949747468305833 2024-08-14 19:55:20,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2820420.0, ans=0.125 2024-08-14 19:55:33,833 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 19:55:46,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-14 19:55:47,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2820620.0, ans=0.2 2024-08-14 19:55:50,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.346e+01 2.527e+01 2.810e+01 4.755e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 19:55:53,689 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-14 19:56:22,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2820820.0, ans=0.0 2024-08-14 19:56:32,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6750, loss[loss=0.1136, beats_loss=0.009058, ecapa_loss=0.0002033, whisper_loss=0.1025, over 19166.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01063, ecapa_loss=0.0001532, whisper_loss=0.09191, over 3876792.90 frames. ], batch size: 83, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:56:48,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2821020.0, ans=0.2 2024-08-14 19:56:50,384 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 19:56:50,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2821020.0, ans=0.125 2024-08-14 19:57:11,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2821120.0, ans=0.0 2024-08-14 19:57:50,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6800, loss[loss=0.08211, beats_loss=0.009977, ecapa_loss=0.000173, whisper_loss=0.0704, over 19160.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001542, whisper_loss=0.0917, over 3866469.93 frames. ], batch size: 79, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:57:59,890 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 19:58:02,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-14 19:58:08,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2821520.0, ans=0.0 2024-08-14 19:58:12,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2821520.0, ans=0.125 2024-08-14 19:58:22,781 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 19:58:27,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.395e+01 2.601e+01 3.094e+01 9.420e+01, threshold=5.202e+01, percent-clipped=3.0 2024-08-14 19:58:35,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2821620.0, ans=0.1 2024-08-14 19:58:40,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2821720.0, ans=0.0 2024-08-14 19:58:46,686 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 19:58:49,143 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 19:58:59,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2821820.0, ans=0.0 2024-08-14 19:59:08,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6850, loss[loss=0.08923, beats_loss=0.01324, ecapa_loss=0.0001185, whisper_loss=0.07481, over 19372.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.000155, whisper_loss=0.09096, over 3806788.46 frames. ], batch size: 76, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 19:59:14,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2821920.0, ans=0.125 2024-08-14 19:59:35,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2822020.0, ans=0.1 2024-08-14 19:59:38,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2822120.0, ans=0.2 2024-08-14 19:59:41,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2822120.0, ans=0.125 2024-08-14 20:00:23,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6900, loss[loss=0.1252, beats_loss=0.008689, ecapa_loss=0.0001212, whisper_loss=0.1153, over 16707.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001547, whisper_loss=0.09063, over 3804710.04 frames. ], batch size: 60, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:00:44,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2822520.0, ans=0.0 2024-08-14 20:00:59,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.271e+01 2.537e+01 2.771e+01 4.123e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-14 20:01:10,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-08-14 20:01:12,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2822720.0, ans=0.0 2024-08-14 20:01:18,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2822720.0, ans=0.125 2024-08-14 20:01:25,367 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-14 20:01:40,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 6950, loss[loss=0.1114, beats_loss=0.008422, ecapa_loss=0.0001787, whisper_loss=0.1012, over 15897.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001537, whisper_loss=0.09047, over 3814198.69 frames. ], batch size: 63, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:01:45,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=12.0 2024-08-14 20:01:53,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2822920.0, ans=0.125 2024-08-14 20:01:56,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2024-08-14 20:01:57,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2823020.0, ans=0.2 2024-08-14 20:02:16,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2823120.0, ans=0.07 2024-08-14 20:02:27,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2823220.0, ans=0.125 2024-08-14 20:02:27,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-14 20:02:28,319 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 33 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 20:02:30,023 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.016e-02 2024-08-14 20:02:34,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2823220.0, ans=0.1 2024-08-14 20:02:55,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7000, loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001719, whisper_loss=0.09084, over 16481.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001541, whisper_loss=0.09137, over 3839159.85 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:03:11,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2823520.0, ans=0.125 2024-08-14 20:03:16,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2823520.0, ans=0.0 2024-08-14 20:03:19,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2823520.0, ans=0.125 2024-08-14 20:03:29,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.376e+01 2.618e+01 2.959e+01 4.269e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-14 20:03:35,793 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 20:03:36,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2823620.0, ans=0.04949747468305833 2024-08-14 20:03:40,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2823720.0, ans=0.0 2024-08-14 20:03:44,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2823720.0, ans=0.0 2024-08-14 20:03:46,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823720.0, ans=0.1 2024-08-14 20:03:49,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-14 20:03:56,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2823820.0, ans=0.125 2024-08-14 20:03:59,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2024-08-14 20:04:09,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7050, loss[loss=0.09883, beats_loss=0.01202, ecapa_loss=0.0001606, whisper_loss=0.08521, over 21594.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001539, whisper_loss=0.09132, over 3854610.88 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:04:14,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2823920.0, ans=0.125 2024-08-14 20:04:17,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2823920.0, ans=0.0 2024-08-14 20:04:33,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2824020.0, ans=0.0 2024-08-14 20:04:36,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2824020.0, ans=0.125 2024-08-14 20:05:24,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7100, loss[loss=0.115, beats_loss=0.008844, ecapa_loss=0.0001477, whisper_loss=0.1047, over 18514.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001528, whisper_loss=0.09116, over 3824860.06 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:05:25,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2824420.0, ans=0.0 2024-08-14 20:05:26,518 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 20:05:41,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2824520.0, ans=0.125 2024-08-14 20:05:42,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-14 20:05:46,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-08-14 20:05:55,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2824620.0, ans=0.125 2024-08-14 20:05:56,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2824620.0, ans=0.125 2024-08-14 20:06:00,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.271e+01 2.594e+01 2.929e+01 4.373e+01, threshold=5.188e+01, percent-clipped=0.0 2024-08-14 20:06:01,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2824620.0, ans=0.125 2024-08-14 20:06:02,249 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 20:06:04,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2824620.0, ans=0.0 2024-08-14 20:06:10,909 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-14 20:06:18,467 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 20:06:22,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2024-08-14 20:06:26,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2824820.0, ans=0.125 2024-08-14 20:06:29,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2824820.0, ans=0.125 2024-08-14 20:06:37,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2024-08-14 20:06:38,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7150, loss[loss=0.1056, beats_loss=0.008065, ecapa_loss=0.000167, whisper_loss=0.0959, over 17333.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.09114, over 3843292.50 frames. ], batch size: 70, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:06:47,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-14 20:06:58,031 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-14 20:06:59,685 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-14 20:07:07,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2825120.0, ans=0.125 2024-08-14 20:07:11,696 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 40 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 20:07:33,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2825220.0, ans=0.95 2024-08-14 20:07:39,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-08-14 20:07:41,518 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 20:07:45,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-14 20:07:45,927 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-14 20:07:46,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2825320.0, ans=0.125 2024-08-14 20:07:48,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=12.0 2024-08-14 20:07:53,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7200, loss[loss=0.08839, beats_loss=0.01339, ecapa_loss=0.0001428, whisper_loss=0.07358, over 22476.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001517, whisper_loss=0.09144, over 3886339.31 frames. ], batch size: 93, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:07:57,982 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 20:08:12,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2825520.0, ans=0.0 2024-08-14 20:08:16,743 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-14 20:08:19,530 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 20:08:27,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.392e+01 2.665e+01 3.083e+01 4.439e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-14 20:08:30,977 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-14 20:08:33,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-08-14 20:08:53,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=2825820.0, ans=0.02 2024-08-14 20:08:56,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825820.0, ans=0.1 2024-08-14 20:09:06,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7250, loss[loss=0.1143, beats_loss=0.008533, ecapa_loss=0.0001879, whisper_loss=0.1039, over 16249.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.0001531, whisper_loss=0.09209, over 3882302.25 frames. ], batch size: 65, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:09:27,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2826020.0, ans=0.0 2024-08-14 20:09:51,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-14 20:10:03,901 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-14 20:10:04,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2826320.0, ans=0.0 2024-08-14 20:10:13,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2826320.0, ans=0.125 2024-08-14 20:10:19,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7300, loss[loss=0.1125, beats_loss=0.009104, ecapa_loss=0.00015, whisper_loss=0.1019, over 14230.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01054, ecapa_loss=0.0001538, whisper_loss=0.09245, over 3868534.49 frames. ], batch size: 54, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:11:16,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2826520.0, ans=0.125 2024-08-14 20:11:25,207 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-14 20:11:26,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.413e+01 2.619e+01 3.021e+01 6.286e+01, threshold=5.238e+01, percent-clipped=1.0 2024-08-14 20:11:40,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-14 20:11:41,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2826720.0, ans=0.0 2024-08-14 20:11:52,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=15.0 2024-08-14 20:12:04,221 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7350, loss[loss=0.09136, beats_loss=0.01266, ecapa_loss=0.0001195, whisper_loss=0.0775, over 21990.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001544, whisper_loss=0.09154, over 3873689.73 frames. ], batch size: 87, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:12:07,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2826920.0, ans=0.125 2024-08-14 20:12:15,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2826920.0, ans=0.0 2024-08-14 20:12:19,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-08-14 20:12:25,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-14 20:12:38,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2827120.0, ans=0.1 2024-08-14 20:12:44,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-14 20:12:51,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2827220.0, ans=0.125 2024-08-14 20:12:57,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2827220.0, ans=0.125 2024-08-14 20:13:03,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2827220.0, ans=0.125 2024-08-14 20:13:04,243 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-14 20:13:21,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7400, loss[loss=0.1149, beats_loss=0.008466, ecapa_loss=0.0001539, whisper_loss=0.1049, over 23205.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01063, ecapa_loss=0.0001536, whisper_loss=0.09103, over 3885759.96 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:13:31,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-14 20:13:32,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2827420.0, ans=0.1 2024-08-14 20:13:35,547 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 20:13:40,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2827520.0, ans=0.0 2024-08-14 20:13:59,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.369e+01 2.692e+01 3.042e+01 1.751e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-14 20:14:31,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2827820.0, ans=0.125 2024-08-14 20:14:40,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7450, loss[loss=0.07849, beats_loss=0.01228, ecapa_loss=0.0001565, whisper_loss=0.06464, over 17346.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001536, whisper_loss=0.09116, over 3894853.93 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:15:03,596 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 20:15:22,017 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-14 20:15:22,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2828120.0, ans=0.125 2024-08-14 20:15:42,743 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 20:15:44,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2828220.0, ans=0.125 2024-08-14 20:15:51,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2828220.0, ans=0.0 2024-08-14 20:15:52,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2828220.0, ans=0.0 2024-08-14 20:16:11,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2828320.0, ans=0.125 2024-08-14 20:16:15,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7500, loss[loss=0.1227, beats_loss=0.006995, ecapa_loss=0.0002073, whisper_loss=0.1137, over 15859.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01049, ecapa_loss=0.0001541, whisper_loss=0.09181, over 3901357.43 frames. ], batch size: 65, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:16:16,989 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-14 20:16:23,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2828420.0, ans=0.1 2024-08-14 20:16:54,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-14 20:16:55,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2828620.0, ans=0.025 2024-08-14 20:17:00,469 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 20:17:01,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.331e+01 2.565e+01 2.874e+01 3.652e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-14 20:17:09,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-14 20:17:14,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2828720.0, ans=0.1 2024-08-14 20:17:32,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2828820.0, ans=0.0 2024-08-14 20:17:42,178 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-14 20:17:45,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2024-08-14 20:17:51,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7550, loss[loss=0.06806, beats_loss=0.01423, ecapa_loss=0.0001224, whisper_loss=0.0526, over 22865.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001531, whisper_loss=0.09128, over 3869066.89 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:17:51,789 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-14 20:17:53,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-08-14 20:17:55,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-14 20:18:10,407 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 20:18:20,422 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 20:18:38,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2829120.0, ans=0.125 2024-08-14 20:18:43,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2829120.0, ans=0.2 2024-08-14 20:19:09,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2829320.0, ans=0.1 2024-08-14 20:19:17,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-08-14 20:19:18,444 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 20:19:25,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7600, loss[loss=0.1051, beats_loss=0.009025, ecapa_loss=0.0001409, whisper_loss=0.09469, over 18498.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01055, ecapa_loss=0.0001545, whisper_loss=0.0914, over 3843925.14 frames. ], batch size: 71, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:19:31,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2024-08-14 20:19:38,540 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 20:19:38,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2829420.0, ans=0.0 2024-08-14 20:19:44,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-08-14 20:20:08,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.345e+01 2.622e+01 3.093e+01 1.598e+02, threshold=5.244e+01, percent-clipped=3.0 2024-08-14 20:20:09,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2829620.0, ans=0.125 2024-08-14 20:20:11,957 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 20:20:23,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-14 20:20:23,820 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 20:20:25,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2829720.0, ans=0.125 2024-08-14 20:20:46,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7650, loss[loss=0.09711, beats_loss=0.00825, ecapa_loss=0.0001933, whisper_loss=0.08693, over 17563.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001556, whisper_loss=0.0913, over 3865988.94 frames. ], batch size: 72, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:20:48,359 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 20:20:48,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2829920.0, ans=0.05 2024-08-14 20:20:50,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2829920.0, ans=0.125 2024-08-14 20:21:03,052 WARNING [optim.py:496] (3/4) Scaling gradients by 0.061782095581293106, model_norm_threshold=52.43657684326172 2024-08-14 20:21:03,236 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.640e+05, grad_sumsq=1.648e+07, orig_rms_sq=9.952e-03 2024-08-14 20:21:06,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2830020.0, ans=0.125 2024-08-14 20:21:19,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2830120.0, ans=0.0 2024-08-14 20:21:22,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2830120.0, ans=0.125 2024-08-14 20:21:33,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2830220.0, ans=0.0 2024-08-14 20:21:43,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=22.5 2024-08-14 20:21:46,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2830320.0, ans=0.0 2024-08-14 20:21:53,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-08-14 20:21:57,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7700, loss[loss=0.1097, beats_loss=0.009, ecapa_loss=0.0001783, whisper_loss=0.09891, over 21339.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001546, whisper_loss=0.09075, over 3879296.15 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:22:30,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.409e+01 2.589e+01 2.990e+01 8.487e+02, threshold=5.178e+01, percent-clipped=3.0 2024-08-14 20:22:38,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2830720.0, ans=0.1 2024-08-14 20:22:42,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2830720.0, ans=0.0 2024-08-14 20:22:46,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-08-14 20:22:47,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2830720.0, ans=0.04949747468305833 2024-08-14 20:23:03,203 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:23:05,593 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-14 20:23:08,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7750, loss[loss=0.1153, beats_loss=0.008167, ecapa_loss=0.0001786, whisper_loss=0.1053, over 17719.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001539, whisper_loss=0.09077, over 3907209.73 frames. ], batch size: 69, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:23:10,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2830920.0, ans=0.125 2024-08-14 20:23:10,418 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.083e+00 2024-08-14 20:23:11,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2830920.0, ans=6.0 2024-08-14 20:23:18,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830920.0, ans=0.1 2024-08-14 20:23:21,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2831020.0, ans=0.0 2024-08-14 20:23:28,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831020.0, ans=0.125 2024-08-14 20:23:45,707 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-14 20:23:48,649 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-14 20:23:51,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2831220.0, ans=0.1 2024-08-14 20:24:04,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-14 20:24:14,398 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 20:24:18,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2831420.0, ans=0.125 2024-08-14 20:24:19,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7800, loss[loss=0.07067, beats_loss=0.01401, ecapa_loss=0.0001531, whisper_loss=0.05514, over 22845.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.09076, over 3902784.48 frames. ], batch size: 96, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:24:23,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2831420.0, ans=0.125 2024-08-14 20:24:28,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2831420.0, ans=0.0 2024-08-14 20:24:28,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2831420.0, ans=0.125 2024-08-14 20:24:30,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2831420.0, ans=0.2 2024-08-14 20:24:31,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2024-08-14 20:24:34,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2831520.0, ans=0.0 2024-08-14 20:24:40,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2831520.0, ans=0.0 2024-08-14 20:24:54,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.359e+01 2.580e+01 2.928e+01 4.088e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-14 20:24:55,039 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-14 20:25:20,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831820.0, ans=0.125 2024-08-14 20:25:32,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7850, loss[loss=0.1076, beats_loss=0.008689, ecapa_loss=0.0001541, whisper_loss=0.09742, over 20152.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001536, whisper_loss=0.09051, over 3930154.16 frames. ], batch size: 79, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:25:38,284 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 20:25:38,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2831920.0, ans=0.2 2024-08-14 20:26:15,459 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-14 20:26:18,311 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 20:26:22,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2832220.0, ans=0.125 2024-08-14 20:26:41,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2832320.0, ans=0.0 2024-08-14 20:26:41,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-08-14 20:26:41,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.02 vs. limit=22.5 2024-08-14 20:26:43,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7900, loss[loss=0.1199, beats_loss=0.009591, ecapa_loss=0.0001625, whisper_loss=0.1087, over 14524.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001524, whisper_loss=0.09077, over 3950143.47 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:26:45,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2832420.0, ans=0.125 2024-08-14 20:26:49,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2832420.0, ans=0.0 2024-08-14 20:26:53,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2832420.0, ans=0.0 2024-08-14 20:27:10,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2832520.0, ans=0.125 2024-08-14 20:27:13,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2832620.0, ans=0.5 2024-08-14 20:27:18,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.337e+01 2.582e+01 2.870e+01 4.311e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-14 20:27:38,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 20:27:42,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2832820.0, ans=0.125 2024-08-14 20:27:56,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 7950, loss[loss=0.09531, beats_loss=0.01069, ecapa_loss=0.0001557, whisper_loss=0.08307, over 22447.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01083, ecapa_loss=0.0001533, whisper_loss=0.09021, over 3917669.79 frames. ], batch size: 91, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:27:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2832920.0, ans=0.125 2024-08-14 20:28:14,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2833020.0, ans=0.125 2024-08-14 20:28:21,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-14 20:28:40,439 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 20:28:49,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2833220.0, ans=0.125 2024-08-14 20:28:51,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-14 20:29:05,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=12.0 2024-08-14 20:29:09,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8000, loss[loss=0.07544, beats_loss=0.01208, ecapa_loss=0.0001579, whisper_loss=0.06178, over 13857.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01077, ecapa_loss=0.0001525, whisper_loss=0.0902, over 3889968.79 frames. ], batch size: 57, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:29:09,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2833420.0, ans=0.125 2024-08-14 20:29:14,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2833420.0, ans=0.0 2024-08-14 20:29:26,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2833520.0, ans=0.125 2024-08-14 20:29:29,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2833520.0, ans=0.05 2024-08-14 20:29:30,909 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 20:29:39,656 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:29:43,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.316e+01 2.668e+01 3.025e+01 4.748e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-14 20:29:46,656 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 20:30:00,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2833720.0, ans=0.125 2024-08-14 20:30:05,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2833820.0, ans=0.125 2024-08-14 20:30:20,485 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8050, loss[loss=0.1204, beats_loss=0.009268, ecapa_loss=0.000187, whisper_loss=0.1092, over 16694.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.09052, over 3870526.05 frames. ], batch size: 67, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:30:26,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2833920.0, ans=0.125 2024-08-14 20:30:40,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=22.5 2024-08-14 20:30:41,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2834020.0, ans=0.0 2024-08-14 20:31:05,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2834220.0, ans=0.04949747468305833 2024-08-14 20:31:30,662 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-14 20:31:31,739 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8100, loss[loss=0.09403, beats_loss=0.01035, ecapa_loss=0.0001533, whisper_loss=0.08215, over 16464.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001524, whisper_loss=0.09014, over 3892857.89 frames. ], batch size: 68, lr: 3.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:31:35,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-08-14 20:31:41,727 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 20:31:50,767 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-14 20:31:53,620 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 20:32:04,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2834620.0, ans=0.0 2024-08-14 20:32:04,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-14 20:32:06,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.290e+01 2.522e+01 2.889e+01 4.208e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-14 20:32:07,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2834620.0, ans=0.125 2024-08-14 20:32:09,669 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 20:32:17,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2834720.0, ans=0.2 2024-08-14 20:32:45,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8150, loss[loss=0.08278, beats_loss=0.009104, ecapa_loss=0.0001732, whisper_loss=0.07194, over 13979.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001526, whisper_loss=0.09011, over 3903671.58 frames. ], batch size: 54, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:32:54,359 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 20:33:11,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2835020.0, ans=0.0 2024-08-14 20:33:58,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8200, loss[loss=0.1058, beats_loss=0.01074, ecapa_loss=0.0001523, whisper_loss=0.09349, over 14946.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001524, whisper_loss=0.09012, over 3909324.95 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:34:01,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2835420.0, ans=0.0 2024-08-14 20:34:24,740 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-14 20:34:32,432 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-14 20:34:33,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.288e+01 2.494e+01 2.883e+01 1.855e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-14 20:34:34,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835620.0, ans=0.125 2024-08-14 20:34:58,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2835820.0, ans=0.0 2024-08-14 20:35:10,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8250, loss[loss=0.0964, beats_loss=0.009838, ecapa_loss=0.0001815, whisper_loss=0.08475, over 16252.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001529, whisper_loss=0.08994, over 3912441.12 frames. ], batch size: 65, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:35:15,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2835920.0, ans=0.0 2024-08-14 20:35:25,688 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-14 20:35:40,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2836120.0, ans=0.0 2024-08-14 20:36:13,211 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-14 20:36:17,825 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:36:23,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8300, loss[loss=0.1121, beats_loss=0.01106, ecapa_loss=0.0001356, whisper_loss=0.09967, over 23339.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001529, whisper_loss=0.08993, over 3904715.93 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:36:29,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2836420.0, ans=0.0 2024-08-14 20:36:39,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2836520.0, ans=0.125 2024-08-14 20:36:41,381 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:36:45,167 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 20:36:57,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.392e+01 2.726e+01 3.062e+01 2.103e+02, threshold=5.453e+01, percent-clipped=2.0 2024-08-14 20:37:17,619 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-14 20:37:18,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2836720.0, ans=0.1 2024-08-14 20:37:29,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2836820.0, ans=0.1 2024-08-14 20:37:34,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8350, loss[loss=0.0995, beats_loss=0.009636, ecapa_loss=0.0002521, whisper_loss=0.08734, over 17236.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001531, whisper_loss=0.09042, over 3914651.52 frames. ], batch size: 80, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:37:34,675 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 20:37:43,483 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-14 20:37:52,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-14 20:37:58,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2837020.0, ans=0.125 2024-08-14 20:38:00,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-14 20:38:00,804 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 20:38:01,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2837020.0, ans=0.2 2024-08-14 20:38:04,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2837120.0, ans=0.2 2024-08-14 20:38:19,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2837220.0, ans=0.2 2024-08-14 20:38:30,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-14 20:38:46,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8400, loss[loss=0.1183, beats_loss=0.01054, ecapa_loss=0.000146, whisper_loss=0.1063, over 22880.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001532, whisper_loss=0.09103, over 3933404.04 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:38:47,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2837420.0, ans=0.0 2024-08-14 20:38:52,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-08-14 20:39:01,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-14 20:39:03,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837520.0, ans=0.1 2024-08-14 20:39:10,908 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 20:39:17,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2837620.0, ans=0.125 2024-08-14 20:39:22,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.308e+01 2.540e+01 2.813e+01 3.907e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 20:39:33,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2837720.0, ans=0.125 2024-08-14 20:39:37,166 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 20:39:38,540 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-14 20:39:48,951 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 20:39:59,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8450, loss[loss=0.08281, beats_loss=0.01236, ecapa_loss=0.0001461, whisper_loss=0.06899, over 20634.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001535, whisper_loss=0.0911, over 3906052.53 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:40:14,520 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 20:40:28,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2838120.0, ans=0.125 2024-08-14 20:40:44,360 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-14 20:41:00,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2838320.0, ans=0.125 2024-08-14 20:41:06,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2838320.0, ans=0.2 2024-08-14 20:41:11,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8500, loss[loss=0.1035, beats_loss=0.01139, ecapa_loss=0.0001412, whisper_loss=0.09071, over 16784.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001539, whisper_loss=0.0908, over 3905887.38 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:41:17,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2838420.0, ans=0.125 2024-08-14 20:41:33,184 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-14 20:41:36,164 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-14 20:41:45,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.375e+01 2.644e+01 3.031e+01 3.106e+02, threshold=5.288e+01, percent-clipped=1.0 2024-08-14 20:41:46,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2838620.0, ans=0.2 2024-08-14 20:41:51,474 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-14 20:41:54,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2838720.0, ans=0.1 2024-08-14 20:42:22,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=12.0 2024-08-14 20:42:22,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8550, loss[loss=0.1097, beats_loss=0.008528, ecapa_loss=0.0001757, whisper_loss=0.09938, over 21857.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001537, whisper_loss=0.09155, over 3915872.25 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:42:33,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2838920.0, ans=0.125 2024-08-14 20:42:40,133 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 20:42:41,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2839020.0, ans=0.0 2024-08-14 20:42:57,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2839120.0, ans=0.09899494936611666 2024-08-14 20:43:07,622 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 20:43:12,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=2839220.0, ans=0.02 2024-08-14 20:43:14,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-08-14 20:43:16,810 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-14 20:43:22,185 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-14 20:43:35,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8600, loss[loss=0.1082, beats_loss=0.008629, ecapa_loss=0.0001818, whisper_loss=0.09772, over 19303.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001528, whisper_loss=0.09135, over 3892954.18 frames. ], batch size: 81, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:43:39,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2839420.0, ans=0.125 2024-08-14 20:43:40,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2839420.0, ans=0.125 2024-08-14 20:43:41,628 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-14 20:43:51,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-14 20:44:10,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.454e+01 2.758e+01 3.025e+01 4.750e+01, threshold=5.517e+01, percent-clipped=0.0 2024-08-14 20:44:13,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-08-14 20:44:15,789 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-14 20:44:26,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2839720.0, ans=0.1 2024-08-14 20:44:29,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2839720.0, ans=0.0 2024-08-14 20:44:49,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8650, loss[loss=0.1127, beats_loss=0.01021, ecapa_loss=0.0001373, whisper_loss=0.1011, over 22372.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.000153, whisper_loss=0.09089, over 3850508.15 frames. ], batch size: 87, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:44:51,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2024-08-14 20:44:51,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2024-08-14 20:45:03,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2839920.0, ans=0.125 2024-08-14 20:45:08,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2840020.0, ans=0.015 2024-08-14 20:45:30,852 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-14 20:45:36,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2840220.0, ans=0.2 2024-08-14 20:45:54,070 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-14 20:45:58,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2840320.0, ans=0.125 2024-08-14 20:46:05,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8700, loss[loss=0.1181, beats_loss=0.009013, ecapa_loss=0.0001578, whisper_loss=0.1075, over 20674.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001535, whisper_loss=0.09146, over 3855603.08 frames. ], batch size: 82, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:46:05,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2840420.0, ans=0.2 2024-08-14 20:46:11,360 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 20:46:17,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2840420.0, ans=0.0 2024-08-14 20:46:24,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2840520.0, ans=0.125 2024-08-14 20:46:39,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.465e+01 2.655e+01 3.081e+01 6.274e+01, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:46:42,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-08-14 20:46:44,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2840620.0, ans=0.0 2024-08-14 20:47:08,772 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 20:47:16,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2840920.0, ans=0.0 2024-08-14 20:47:17,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8750, loss[loss=0.07215, beats_loss=0.01143, ecapa_loss=0.0001873, whisper_loss=0.05885, over 15255.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001537, whisper_loss=0.09111, over 3862665.81 frames. ], batch size: 62, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:47:19,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-08-14 20:47:29,103 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 20:47:32,990 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 20:47:35,825 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 20:47:44,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2841120.0, ans=0.125 2024-08-14 20:48:13,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2841220.0, ans=0.125 2024-08-14 20:48:29,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8800, loss[loss=0.1105, beats_loss=0.009482, ecapa_loss=0.0001658, whisper_loss=0.09937, over 14729.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01063, ecapa_loss=0.0001527, whisper_loss=0.09121, over 3871495.58 frames. ], batch size: 59, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:48:34,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2841420.0, ans=15.0 2024-08-14 20:48:38,069 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 20:48:43,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2024-08-14 20:48:51,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2841520.0, ans=0.125 2024-08-14 20:48:57,107 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-14 20:49:00,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2841620.0, ans=0.125 2024-08-14 20:49:05,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.267e+01 2.535e+01 2.766e+01 4.137e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-14 20:49:11,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841620.0, ans=0.1 2024-08-14 20:49:18,683 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 12 from Vox, 50 fro AS 2024-08-14 20:49:32,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-08-14 20:49:43,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8850, loss[loss=0.06811, beats_loss=0.01387, ecapa_loss=0.0001347, whisper_loss=0.05289, over 22191.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001521, whisper_loss=0.09046, over 3874316.46 frames. ], batch size: 94, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:49:59,497 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 23 from Vox, 16 fro AS 2024-08-14 20:50:04,993 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-14 20:50:14,715 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 20:50:30,273 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 20:50:40,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2842320.0, ans=0.05 2024-08-14 20:50:44,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2842320.0, ans=0.125 2024-08-14 20:50:46,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-14 20:50:54,284 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8900, loss[loss=0.1011, beats_loss=0.01166, ecapa_loss=0.0001401, whisper_loss=0.08805, over 23200.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01078, ecapa_loss=0.000152, whisper_loss=0.09, over 3863767.18 frames. ], batch size: 94, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:50:54,477 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-14 20:50:56,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2024-08-14 20:51:26,445 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 20:51:28,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-14 20:51:29,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.335e+01 2.555e+01 2.826e+01 4.520e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-14 20:51:30,868 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 20:51:31,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2842620.0, ans=0.125 2024-08-14 20:51:54,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2842820.0, ans=0.025 2024-08-14 20:52:01,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=12.0 2024-08-14 20:52:06,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 8950, loss[loss=0.09943, beats_loss=0.01151, ecapa_loss=0.0001789, whisper_loss=0.08613, over 21254.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001526, whisper_loss=0.09073, over 3886967.47 frames. ], batch size: 92, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:52:08,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2842920.0, ans=0.0 2024-08-14 20:52:10,746 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 20:52:13,692 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-14 20:52:20,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2843020.0, ans=0.1 2024-08-14 20:52:40,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2843120.0, ans=0.2 2024-08-14 20:52:49,036 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.693e+05 2024-08-14 20:52:49,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-08-14 20:53:18,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9000, loss[loss=0.09682, beats_loss=0.01209, ecapa_loss=0.0001319, whisper_loss=0.08342, over 21295.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001532, whisper_loss=0.08991, over 3884298.78 frames. ], batch size: 84, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:53:18,864 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 20:54:01,045 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005268, whisper_loss=0.2474, over 922467.00 frames. 2024-08-14 20:54:16,980 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-14 20:56:16,704 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on AT_audioset: loss=0.0236, beats_loss=0.0236, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 20:56:16,709 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 20:56:17,166 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 20:56:24,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2843420.0, ans=0.0 2024-08-14 20:56:29,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2843420.0, ans=0.1 2024-08-14 20:56:31,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-14 20:56:33,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2843520.0, ans=0.2 2024-08-14 20:56:37,466 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-14 20:56:51,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2024-08-14 20:56:51,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.250e+01 2.510e+01 2.872e+01 4.631e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-14 20:56:54,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2843620.0, ans=0.2 2024-08-14 20:57:20,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2843820.0, ans=0.125 2024-08-14 20:57:29,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9050, loss[loss=0.07203, beats_loss=0.01083, ecapa_loss=0.0001429, whisper_loss=0.05977, over 15662.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001533, whisper_loss=0.09091, over 3886238.09 frames. ], batch size: 60, lr: 3.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 20:57:30,203 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-14 20:57:42,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2843920.0, ans=0.0 2024-08-14 20:57:46,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844020.0, ans=0.1 2024-08-14 20:57:54,801 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-14 20:57:56,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2844020.0, ans=0.0 2024-08-14 20:57:58,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2844120.0, ans=0.07 2024-08-14 20:58:06,709 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 20:58:33,017 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-14 20:58:41,888 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-14 20:58:43,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9100, loss[loss=0.09841, beats_loss=0.008789, ecapa_loss=0.0002261, whisper_loss=0.08736, over 15354.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001547, whisper_loss=0.09116, over 3899775.21 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:58:51,759 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 20:59:08,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2844520.0, ans=0.05 2024-08-14 20:59:17,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.420e+01 2.655e+01 2.997e+01 1.110e+02, threshold=5.311e+01, percent-clipped=1.0 2024-08-14 20:59:24,934 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 20:59:26,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2844720.0, ans=0.125 2024-08-14 20:59:28,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2844720.0, ans=0.125 2024-08-14 20:59:32,217 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-14 20:59:42,519 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 20:59:45,287 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 20:59:54,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.65 vs. limit=10.0 2024-08-14 20:59:55,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9150, loss[loss=0.09752, beats_loss=0.01201, ecapa_loss=0.0001551, whisper_loss=0.08396, over 21801.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0107, ecapa_loss=0.0001552, whisper_loss=0.09066, over 3911848.96 frames. ], batch size: 86, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 20:59:56,725 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 21:00:04,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-14 21:00:40,117 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 21:00:43,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2845220.0, ans=0.125 2024-08-14 21:00:58,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2845320.0, ans=0.125 2024-08-14 21:01:02,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2845320.0, ans=0.2 2024-08-14 21:01:06,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2845420.0, ans=0.0 2024-08-14 21:01:07,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9200, loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001409, whisper_loss=0.09308, over 17846.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01078, ecapa_loss=0.0001554, whisper_loss=0.09097, over 3907711.18 frames. ], batch size: 70, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:01:08,747 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 21:01:20,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-14 21:01:26,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2845520.0, ans=0.2 2024-08-14 21:01:41,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.415e+01 2.661e+01 2.941e+01 2.596e+02, threshold=5.321e+01, percent-clipped=3.0 2024-08-14 21:01:44,586 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-14 21:02:04,818 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 21:02:15,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2845820.0, ans=0.1 2024-08-14 21:02:17,709 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-14 21:02:18,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9250, loss[loss=0.09959, beats_loss=0.009162, ecapa_loss=0.0001823, whisper_loss=0.0886, over 20801.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001547, whisper_loss=0.09009, over 3891046.17 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:02:30,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2845920.0, ans=0.1 2024-08-14 21:02:48,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2846120.0, ans=0.0 2024-08-14 21:02:49,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2846120.0, ans=0.0 2024-08-14 21:02:52,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2024-08-14 21:03:16,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=12.0 2024-08-14 21:03:19,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2846320.0, ans=0.0 2024-08-14 21:03:22,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2846320.0, ans=0.125 2024-08-14 21:03:25,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-14 21:03:33,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9300, loss[loss=0.1014, beats_loss=0.009011, ecapa_loss=0.0001602, whisper_loss=0.09079, over 17835.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0108, ecapa_loss=0.0001535, whisper_loss=0.09026, over 3887131.66 frames. ], batch size: 71, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:03:45,598 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 21:04:03,575 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-14 21:04:08,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.351e+01 2.533e+01 2.913e+01 3.870e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-14 21:04:14,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2846620.0, ans=0.0 2024-08-14 21:04:14,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2846620.0, ans=0.125 2024-08-14 21:04:23,436 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-14 21:04:48,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9350, loss[loss=0.1097, beats_loss=0.00941, ecapa_loss=0.0001657, whisper_loss=0.09868, over 23272.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001536, whisper_loss=0.09033, over 3889861.59 frames. ], batch size: 94, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:05:00,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2846920.0, ans=0.125 2024-08-14 21:05:03,912 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:05:12,086 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-14 21:05:29,686 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 21:06:01,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9400, loss[loss=0.1214, beats_loss=0.009911, ecapa_loss=0.0001228, whisper_loss=0.1102, over 24428.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01075, ecapa_loss=0.000153, whisper_loss=0.09066, over 3902750.27 frames. ], batch size: 90, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:06:26,109 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:06:37,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-14 21:06:38,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.317e+01 2.592e+01 2.927e+01 3.881e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-14 21:06:56,893 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-14 21:06:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2847720.0, ans=0.0 2024-08-14 21:06:58,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2847720.0, ans=0.1 2024-08-14 21:07:03,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-14 21:07:15,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9450, loss[loss=0.08559, beats_loss=0.01423, ecapa_loss=0.0001599, whisper_loss=0.06976, over 18890.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001527, whisper_loss=0.09004, over 3898806.34 frames. ], batch size: 78, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:07:23,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2847920.0, ans=0.0 2024-08-14 21:07:31,526 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 21:07:33,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2848020.0, ans=0.025 2024-08-14 21:07:34,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2848020.0, ans=0.025 2024-08-14 21:07:51,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2848120.0, ans=0.0 2024-08-14 21:07:56,918 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 21:08:07,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2848220.0, ans=0.2 2024-08-14 21:08:07,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2848220.0, ans=0.125 2024-08-14 21:08:11,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2848220.0, ans=0.125 2024-08-14 21:08:17,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2848320.0, ans=0.125 2024-08-14 21:08:24,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2848320.0, ans=0.125 2024-08-14 21:08:28,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9500, loss[loss=0.1353, beats_loss=0.008569, ecapa_loss=0.000147, whisper_loss=0.1252, over 19237.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01082, ecapa_loss=0.0001524, whisper_loss=0.08966, over 3895080.96 frames. ], batch size: 73, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:08:56,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2848520.0, ans=0.2 2024-08-14 21:09:01,014 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 21:09:03,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.327e+01 2.619e+01 2.918e+01 1.778e+02, threshold=5.238e+01, percent-clipped=2.0 2024-08-14 21:09:14,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2848720.0, ans=0.125 2024-08-14 21:09:18,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2848720.0, ans=0.125 2024-08-14 21:09:29,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2848820.0, ans=0.0 2024-08-14 21:09:35,143 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-14 21:09:42,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9550, loss[loss=0.1097, beats_loss=0.01058, ecapa_loss=0.0001486, whisper_loss=0.09766, over 14998.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01079, ecapa_loss=0.0001531, whisper_loss=0.08993, over 3880962.01 frames. ], batch size: 57, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:09:42,356 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-14 21:09:51,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.90 vs. limit=10.0 2024-08-14 21:10:25,867 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 18 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-14 21:10:28,797 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-14 21:10:29,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2849220.0, ans=0.0 2024-08-14 21:10:30,222 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-14 21:10:31,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2849220.0, ans=0.0 2024-08-14 21:10:33,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2849220.0, ans=0.125 2024-08-14 21:10:53,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9600, loss[loss=0.08137, beats_loss=0.01149, ecapa_loss=0.0001509, whisper_loss=0.06837, over 15999.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001518, whisper_loss=0.08986, over 3846666.92 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:11:14,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-14 21:11:16,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2849520.0, ans=0.125 2024-08-14 21:11:22,529 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-14 21:11:25,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2849620.0, ans=0.95 2024-08-14 21:11:29,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.324e+01 2.593e+01 2.905e+01 4.004e+01, threshold=5.186e+01, percent-clipped=0.0 2024-08-14 21:11:34,433 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 21:11:56,461 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 21:11:57,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2849820.0, ans=6.0 2024-08-14 21:12:07,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9650, loss[loss=0.1137, beats_loss=0.01229, ecapa_loss=0.0001513, whisper_loss=0.09992, over 19856.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.09126, over 3833109.64 frames. ], batch size: 78, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:12:08,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2849920.0, ans=0.0 2024-08-14 21:12:12,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-14 21:12:24,499 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-14 21:12:37,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2024-08-14 21:12:56,334 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:13:05,262 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-14 21:13:16,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.27 vs. limit=10.0 2024-08-14 21:13:20,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9700, loss[loss=0.07858, beats_loss=0.01011, ecapa_loss=0.0001686, whisper_loss=0.06679, over 14080.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001534, whisper_loss=0.09133, over 3871363.52 frames. ], batch size: 58, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:13:21,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-14 21:13:40,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2850520.0, ans=0.2 2024-08-14 21:13:56,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.346e+01 2.562e+01 2.964e+01 3.831e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-14 21:13:58,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2850620.0, ans=0.2 2024-08-14 21:14:06,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=12.0 2024-08-14 21:14:29,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-14 21:14:34,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9750, loss[loss=0.09872, beats_loss=0.01127, ecapa_loss=0.0001523, whisper_loss=0.08592, over 21081.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001529, whisper_loss=0.09111, over 3856190.17 frames. ], batch size: 88, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:14:44,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2850920.0, ans=0.0 2024-08-14 21:14:54,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=22.5 2024-08-14 21:14:59,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2851020.0, ans=0.125 2024-08-14 21:15:14,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2851120.0, ans=0.5 2024-08-14 21:15:29,029 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 21:15:31,979 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-14 21:15:36,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2851320.0, ans=0.1 2024-08-14 21:15:44,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2851320.0, ans=0.125 2024-08-14 21:15:49,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9800, loss[loss=0.08618, beats_loss=0.01312, ecapa_loss=0.0001454, whisper_loss=0.07161, over 21775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01074, ecapa_loss=0.000152, whisper_loss=0.09043, over 3856582.07 frames. ], batch size: 91, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:16:14,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2024-08-14 21:16:17,964 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 21:16:21,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2851620.0, ans=0.07 2024-08-14 21:16:25,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.297e+01 2.616e+01 2.876e+01 8.897e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-14 21:16:48,513 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-14 21:17:03,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9850, loss[loss=0.1093, beats_loss=0.008753, ecapa_loss=0.0002024, whisper_loss=0.09849, over 15086.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01066, ecapa_loss=0.000153, whisper_loss=0.09088, over 3867513.64 frames. ], batch size: 63, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:17:04,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2851920.0, ans=0.125 2024-08-14 21:17:13,140 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 21:17:19,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2852020.0, ans=0.125 2024-08-14 21:17:21,658 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-14 21:17:25,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2852020.0, ans=0.1 2024-08-14 21:17:37,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-08-14 21:17:54,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2852220.0, ans=0.125 2024-08-14 21:18:00,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2852220.0, ans=0.2 2024-08-14 21:18:14,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2852320.0, ans=0.0 2024-08-14 21:18:17,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2852420.0, ans=0.09899494936611666 2024-08-14 21:18:18,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9900, loss[loss=0.1105, beats_loss=0.0104, ecapa_loss=0.0001523, whisper_loss=0.09853, over 23526.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001525, whisper_loss=0.09044, over 3889924.75 frames. ], batch size: 94, lr: 3.11e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:18:20,635 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:18:25,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2852420.0, ans=0.07 2024-08-14 21:18:29,676 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 21:18:50,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2852620.0, ans=0.125 2024-08-14 21:18:54,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.391e+01 2.621e+01 2.869e+01 9.364e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 21:19:06,672 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-14 21:19:35,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 9950, loss[loss=0.1113, beats_loss=0.01128, ecapa_loss=0.0001473, whisper_loss=0.09854, over 20415.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001531, whisper_loss=0.09007, over 3883291.37 frames. ], batch size: 80, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:19:53,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2853020.0, ans=0.125 2024-08-14 21:19:56,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2853020.0, ans=0.125 2024-08-14 21:19:58,168 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-14 21:20:13,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-14 21:20:15,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-14 21:20:21,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2853220.0, ans=0.2 2024-08-14 21:20:37,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2024-08-14 21:20:47,446 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 21:20:51,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-14 21:20:51,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10000, loss[loss=0.09576, beats_loss=0.01147, ecapa_loss=0.0001443, whisper_loss=0.08284, over 16820.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01074, ecapa_loss=0.0001537, whisper_loss=0.09073, over 3870154.15 frames. ], batch size: 66, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:20:57,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2853420.0, ans=0.0 2024-08-14 21:21:06,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2853520.0, ans=0.125 2024-08-14 21:21:24,578 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 21:21:28,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.381e+01 2.626e+01 2.960e+01 1.740e+02, threshold=5.252e+01, percent-clipped=1.0 2024-08-14 21:21:29,352 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 21:21:29,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2853620.0, ans=0.07 2024-08-14 21:21:40,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2853720.0, ans=0.125 2024-08-14 21:21:41,183 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 12 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 21:21:47,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2853720.0, ans=0.1 2024-08-14 21:21:52,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2853820.0, ans=0.2 2024-08-14 21:22:03,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=12.0 2024-08-14 21:22:08,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2853920.0, ans=0.0 2024-08-14 21:22:08,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2024-08-14 21:22:08,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10050, loss[loss=0.1088, beats_loss=0.01104, ecapa_loss=0.0001515, whisper_loss=0.09626, over 16941.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001539, whisper_loss=0.09066, over 3852375.41 frames. ], batch size: 68, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:22:12,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-14 21:22:32,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2024-08-14 21:22:35,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2854020.0, ans=0.125 2024-08-14 21:22:36,566 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-14 21:22:38,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2854020.0, ans=0.125 2024-08-14 21:22:39,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2024-08-14 21:22:41,763 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 21:22:48,158 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 21:23:09,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2854220.0, ans=0.0 2024-08-14 21:23:23,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2854320.0, ans=0.125 2024-08-14 21:23:23,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2854320.0, ans=0.0 2024-08-14 21:23:30,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10100, loss[loss=0.08199, beats_loss=0.01074, ecapa_loss=0.0001446, whisper_loss=0.0698, over 13925.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01068, ecapa_loss=0.0001541, whisper_loss=0.09128, over 3891989.18 frames. ], batch size: 54, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:23:36,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=12.0 2024-08-14 21:23:38,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2854420.0, ans=0.2 2024-08-14 21:23:39,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-14 21:23:46,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2854520.0, ans=0.04949747468305833 2024-08-14 21:23:50,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-14 21:23:55,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2854520.0, ans=0.125 2024-08-14 21:23:58,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-14 21:24:01,279 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-14 21:24:09,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2854620.0, ans=0.1 2024-08-14 21:24:10,655 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.362e+01 2.668e+01 2.989e+01 1.433e+02, threshold=5.336e+01, percent-clipped=3.0 2024-08-14 21:24:46,368 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-14 21:24:52,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10150, loss[loss=0.08844, beats_loss=0.01058, ecapa_loss=0.0001845, whisper_loss=0.07602, over 18549.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01071, ecapa_loss=0.0001551, whisper_loss=0.09045, over 3902196.42 frames. ], batch size: 77, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:24:59,198 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 21:25:15,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2855020.0, ans=0.125 2024-08-14 21:25:23,763 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-14 21:25:47,253 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-14 21:25:56,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2855320.0, ans=0.0 2024-08-14 21:26:09,647 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-14 21:26:10,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10200, loss[loss=0.1072, beats_loss=0.0134, ecapa_loss=0.0001521, whisper_loss=0.09232, over 23135.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001551, whisper_loss=0.09065, over 3913702.47 frames. ], batch size: 95, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:26:45,084 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-14 21:26:46,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.377e+01 2.660e+01 3.071e+01 4.492e+01, threshold=5.321e+01, percent-clipped=0.0 2024-08-14 21:27:16,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2855820.0, ans=0.125 2024-08-14 21:27:23,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10250, loss[loss=0.09698, beats_loss=0.01046, ecapa_loss=0.0001218, whisper_loss=0.08529, over 22596.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.000155, whisper_loss=0.09115, over 3902960.42 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:27:27,182 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-14 21:27:55,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-14 21:28:38,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10300, loss[loss=0.08152, beats_loss=0.01105, ecapa_loss=0.000178, whisper_loss=0.06869, over 21192.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001555, whisper_loss=0.09097, over 3901279.00 frames. ], batch size: 89, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:28:51,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2856520.0, ans=0.2 2024-08-14 21:29:01,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2856520.0, ans=0.0 2024-08-14 21:29:01,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-14 21:29:14,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.284e+01 2.585e+01 2.983e+01 4.241e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-14 21:29:30,947 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-14 21:29:34,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-14 21:29:37,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-08-14 21:29:38,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2856820.0, ans=0.2 2024-08-14 21:29:45,757 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 21:29:50,314 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-14 21:29:53,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10350, loss[loss=0.09213, beats_loss=0.01275, ecapa_loss=0.0001743, whisper_loss=0.07764, over 16191.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001549, whisper_loss=0.09077, over 3924124.90 frames. ], batch size: 65, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:30:13,739 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-14 21:30:40,903 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 21:31:25,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2857320.0, ans=0.125 2024-08-14 21:31:29,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2857320.0, ans=0.0 2024-08-14 21:31:31,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10400, loss[loss=0.1129, beats_loss=0.01052, ecapa_loss=0.000198, whisper_loss=0.1004, over 17352.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001541, whisper_loss=0.09036, over 3895891.82 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:31:32,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2857420.0, ans=0.0 2024-08-14 21:32:12,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.60 vs. limit=6.0 2024-08-14 21:32:14,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.400e+01 2.611e+01 2.963e+01 4.216e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-14 21:32:37,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2857720.0, ans=0.125 2024-08-14 21:32:41,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.92 vs. limit=22.5 2024-08-14 21:32:54,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2857820.0, ans=0.125 2024-08-14 21:32:57,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2857820.0, ans=0.125 2024-08-14 21:32:59,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10450, loss[loss=0.1093, beats_loss=0.009764, ecapa_loss=0.0001763, whisper_loss=0.09773, over 22690.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001546, whisper_loss=0.09079, over 3858862.29 frames. ], batch size: 90, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:33:00,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2857920.0, ans=0.1 2024-08-14 21:33:06,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2857920.0, ans=0.1 2024-08-14 21:33:08,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2857920.0, ans=0.125 2024-08-14 21:33:08,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2857920.0, ans=0.0 2024-08-14 21:33:20,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2858020.0, ans=0.125 2024-08-14 21:33:27,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2858020.0, ans=0.0 2024-08-14 21:33:33,963 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 21:33:34,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2858120.0, ans=0.0 2024-08-14 21:33:50,414 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-14 21:33:53,180 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-14 21:34:09,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2858320.0, ans=0.125 2024-08-14 21:34:29,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10500, loss[loss=0.09, beats_loss=0.01195, ecapa_loss=0.0001506, whisper_loss=0.07654, over 18766.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001549, whisper_loss=0.0904, over 3856263.86 frames. ], batch size: 74, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:34:40,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2858420.0, ans=0.125 2024-08-14 21:34:56,090 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-14 21:35:07,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2858620.0, ans=0.125 2024-08-14 21:35:11,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.314e+01 2.587e+01 2.967e+01 4.494e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-14 21:35:11,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2858620.0, ans=0.2 2024-08-14 21:35:13,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-14 21:35:25,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2858720.0, ans=0.2 2024-08-14 21:35:28,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2858720.0, ans=0.125 2024-08-14 21:35:50,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2858820.0, ans=0.0 2024-08-14 21:35:56,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10550, loss[loss=0.08669, beats_loss=0.01048, ecapa_loss=0.0001265, whisper_loss=0.07494, over 14292.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001545, whisper_loss=0.09054, over 3855789.61 frames. ], batch size: 53, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:36:12,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2859020.0, ans=0.0 2024-08-14 21:36:24,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2859020.0, ans=0.125 2024-08-14 21:36:25,481 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 21:36:26,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2859020.0, ans=0.0 2024-08-14 21:36:28,491 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.851e+01 2024-08-14 21:36:36,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2859120.0, ans=0.125 2024-08-14 21:36:42,817 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 21:36:59,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2024-08-14 21:37:02,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2859220.0, ans=0.0 2024-08-14 21:37:03,784 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-14 21:37:05,914 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-14 21:37:22,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2859320.0, ans=0.0 2024-08-14 21:37:25,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10600, loss[loss=0.09706, beats_loss=0.01012, ecapa_loss=0.0001579, whisper_loss=0.08536, over 21048.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001554, whisper_loss=0.09093, over 3915139.16 frames. ], batch size: 86, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:37:57,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2859520.0, ans=0.2 2024-08-14 21:38:01,559 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-14 21:38:07,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.377e+01 2.615e+01 3.017e+01 5.904e+01, threshold=5.231e+01, percent-clipped=2.0 2024-08-14 21:38:10,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2859620.0, ans=0.125 2024-08-14 21:38:52,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10650, loss[loss=0.1191, beats_loss=0.01011, ecapa_loss=0.0001413, whisper_loss=0.1076, over 16885.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001535, whisper_loss=0.09097, over 3918167.71 frames. ], batch size: 67, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:39:00,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-08-14 21:39:06,886 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-14 21:40:05,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2860320.0, ans=0.2 2024-08-14 21:40:05,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2024-08-14 21:40:13,165 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 21:40:15,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10700, loss[loss=0.1194, beats_loss=0.01082, ecapa_loss=0.0001538, whisper_loss=0.107, over 18627.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01072, ecapa_loss=0.0001532, whisper_loss=0.09094, over 3914192.52 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:40:16,433 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:40:22,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.81 vs. limit=10.0 2024-08-14 21:40:49,801 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 21:40:57,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.378e+01 2.610e+01 2.912e+01 4.621e+02, threshold=5.220e+01, percent-clipped=2.0 2024-08-14 21:41:07,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-14 21:41:09,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2860720.0, ans=0.0 2024-08-14 21:41:14,430 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-14 21:41:39,946 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 21:41:40,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10750, loss[loss=0.101, beats_loss=0.009142, ecapa_loss=0.0001757, whisper_loss=0.09012, over 19849.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001528, whisper_loss=0.09087, over 3929981.70 frames. ], batch size: 82, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:41:45,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2860920.0, ans=0.125 2024-08-14 21:42:04,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2861020.0, ans=0.125 2024-08-14 21:42:17,293 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.909e-02 2024-08-14 21:42:23,872 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:42:58,402 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-14 21:43:04,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2861320.0, ans=0.1 2024-08-14 21:43:09,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10800, loss[loss=0.09133, beats_loss=0.01277, ecapa_loss=0.0001106, whisper_loss=0.07746, over 22591.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01077, ecapa_loss=0.0001521, whisper_loss=0.09123, over 3944875.01 frames. ], batch size: 90, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:43:24,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2861520.0, ans=0.125 2024-08-14 21:43:24,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2861520.0, ans=0.125 2024-08-14 21:43:29,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2861520.0, ans=0.05 2024-08-14 21:43:39,744 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 21:43:46,233 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 21:43:46,673 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:43:48,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2861620.0, ans=0.1 2024-08-14 21:43:49,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.222e+01 2.555e+01 2.864e+01 4.186e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-14 21:43:51,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2861620.0, ans=0.0 2024-08-14 21:43:58,099 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-14 21:44:28,497 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-14 21:44:34,060 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10850, loss[loss=0.1042, beats_loss=0.01024, ecapa_loss=0.0001854, whisper_loss=0.09214, over 17897.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001541, whisper_loss=0.09147, over 3916839.00 frames. ], batch size: 72, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:45:06,405 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-14 21:45:08,829 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.851e-02 2024-08-14 21:45:17,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2862120.0, ans=0.0 2024-08-14 21:45:23,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2024-08-14 21:45:33,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2862220.0, ans=0.07 2024-08-14 21:45:44,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2862320.0, ans=0.0 2024-08-14 21:45:59,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10900, loss[loss=0.1147, beats_loss=0.009379, ecapa_loss=0.0001894, whisper_loss=0.1034, over 22461.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.0001535, whisper_loss=0.09177, over 3919352.21 frames. ], batch size: 90, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:45:59,618 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-14 21:46:00,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-08-14 21:46:03,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2862420.0, ans=0.125 2024-08-14 21:46:12,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2862420.0, ans=0.125 2024-08-14 21:46:39,961 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.360e+01 2.623e+01 2.983e+01 2.734e+02, threshold=5.246e+01, percent-clipped=0.0 2024-08-14 21:46:48,563 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 21:47:05,956 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 21:47:09,047 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-14 21:47:25,514 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 10950, loss[loss=0.1346, beats_loss=0.009034, ecapa_loss=0.0001318, whisper_loss=0.1242, over 23368.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0106, ecapa_loss=0.0001531, whisper_loss=0.09212, over 3935866.53 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:47:26,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-14 21:48:05,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2863120.0, ans=0.1 2024-08-14 21:48:09,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863120.0, ans=0.1 2024-08-14 21:48:18,630 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-14 21:48:34,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-14 21:48:36,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2863320.0, ans=0.125 2024-08-14 21:48:44,233 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-14 21:48:50,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11000, loss[loss=0.1055, beats_loss=0.01065, ecapa_loss=0.0001553, whisper_loss=0.09332, over 19331.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01057, ecapa_loss=0.0001543, whisper_loss=0.09189, over 3917293.99 frames. ], batch size: 77, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:48:54,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=15.0 2024-08-14 21:49:05,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2863520.0, ans=0.125 2024-08-14 21:49:27,834 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-14 21:49:28,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2863620.0, ans=0.035 2024-08-14 21:49:30,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.380e+01 2.630e+01 2.844e+01 1.265e+02, threshold=5.261e+01, percent-clipped=2.0 2024-08-14 21:49:35,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2863620.0, ans=0.125 2024-08-14 21:49:45,919 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 21:49:54,452 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-14 21:50:00,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2863820.0, ans=0.125 2024-08-14 21:50:04,391 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-14 21:50:06,669 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.768e-01 2024-08-14 21:50:15,514 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11050, loss[loss=0.09963, beats_loss=0.01234, ecapa_loss=0.0001248, whisper_loss=0.08604, over 18942.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001546, whisper_loss=0.09119, over 3916540.50 frames. ], batch size: 73, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 21:50:20,840 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-14 21:50:27,834 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 21:50:34,145 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04806042090058327, model_norm_threshold=52.6092414855957 2024-08-14 21:50:34,323 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.750e+05, grad_sumsq=3.750e+05, orig_rms_sq=1.000e+00 2024-08-14 21:50:35,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.66 vs. limit=15.0 2024-08-14 21:51:01,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2864120.0, ans=0.2 2024-08-14 21:51:10,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864220.0, ans=0.125 2024-08-14 21:51:39,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11100, loss[loss=0.07844, beats_loss=0.01323, ecapa_loss=0.0001237, whisper_loss=0.06397, over 17030.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001555, whisper_loss=0.09058, over 3936802.89 frames. ], batch size: 66, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:51:44,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=12.0 2024-08-14 21:51:57,801 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 21:51:58,153 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.851e+05 2024-08-14 21:52:19,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.327e+01 2.589e+01 2.870e+01 1.095e+03, threshold=5.179e+01, percent-clipped=2.0 2024-08-14 21:52:43,670 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 21:52:49,361 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 21:52:52,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2864820.0, ans=0.1 2024-08-14 21:52:54,455 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-14 21:53:04,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11150, loss[loss=0.1098, beats_loss=0.01085, ecapa_loss=0.0001388, whisper_loss=0.09758, over 22929.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001544, whisper_loss=0.0913, over 3932423.06 frames. ], batch size: 89, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:53:27,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2865020.0, ans=0.125 2024-08-14 21:53:34,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2865020.0, ans=0.125 2024-08-14 21:53:44,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-08-14 21:53:54,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2865220.0, ans=0.04949747468305833 2024-08-14 21:54:00,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865220.0, ans=0.1 2024-08-14 21:54:03,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.848e+00 2024-08-14 21:54:05,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2865220.0, ans=0.0 2024-08-14 21:54:05,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2865220.0, ans=0.125 2024-08-14 21:54:07,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-14 21:54:28,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11200, loss[loss=0.1123, beats_loss=0.01012, ecapa_loss=0.0002014, whisper_loss=0.1002, over 21059.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.000153, whisper_loss=0.09217, over 3941740.96 frames. ], batch size: 89, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:54:30,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2865420.0, ans=0.125 2024-08-14 21:54:38,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865420.0, ans=0.1 2024-08-14 21:54:54,842 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-14 21:54:58,574 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-14 21:55:07,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.313e+01 2.527e+01 2.769e+01 5.053e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-14 21:55:12,216 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 21:55:13,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2865620.0, ans=0.125 2024-08-14 21:55:15,968 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 21:55:34,865 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-14 21:55:36,249 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 21:55:52,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11250, loss[loss=0.1036, beats_loss=0.01216, ecapa_loss=0.0001606, whisper_loss=0.08979, over 22737.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001539, whisper_loss=0.09132, over 3900296.81 frames. ], batch size: 93, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:56:01,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865920.0, ans=0.1 2024-08-14 21:56:13,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2866020.0, ans=0.2 2024-08-14 21:56:13,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2866020.0, ans=0.0 2024-08-14 21:56:32,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2866120.0, ans=0.2 2024-08-14 21:56:37,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2866120.0, ans=0.1 2024-08-14 21:56:42,122 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-14 21:57:09,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866320.0, ans=0.1 2024-08-14 21:57:18,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11300, loss[loss=0.1065, beats_loss=0.01143, ecapa_loss=0.0001594, whisper_loss=0.09343, over 19776.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01054, ecapa_loss=0.0001536, whisper_loss=0.09211, over 3908348.88 frames. ], batch size: 81, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:57:58,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.341e+01 2.610e+01 2.928e+01 1.579e+02, threshold=5.221e+01, percent-clipped=2.0 2024-08-14 21:58:04,063 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 30 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-14 21:58:09,287 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 21:58:19,467 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 21:58:20,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2866720.0, ans=10.0 2024-08-14 21:58:20,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2024-08-14 21:58:25,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-14 21:58:28,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2866820.0, ans=0.125 2024-08-14 21:58:41,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11350, loss[loss=0.08946, beats_loss=0.01179, ecapa_loss=0.0001674, whisper_loss=0.076, over 18584.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.000153, whisper_loss=0.09203, over 3889194.66 frames. ], batch size: 75, lr: 3.10e-03, grad_scale: 1.152921504606847e+18 2024-08-14 21:58:52,148 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-14 21:58:59,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2867020.0, ans=0.125 2024-08-14 21:59:23,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2024-08-14 21:59:25,202 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.136e+01 2024-08-14 21:59:28,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2867120.0, ans=0.0 2024-08-14 21:59:32,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2867220.0, ans=0.1 2024-08-14 21:59:43,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2867220.0, ans=0.1 2024-08-14 21:59:50,501 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-14 22:00:08,098 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11400, loss[loss=0.09736, beats_loss=0.006682, ecapa_loss=0.000227, whisper_loss=0.08841, over 13231.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.0001534, whisper_loss=0.09192, over 3885033.78 frames. ], batch size: 56, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:00:18,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2867420.0, ans=0.125 2024-08-14 22:00:23,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2867520.0, ans=0.1 2024-08-14 22:00:27,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2867520.0, ans=0.0 2024-08-14 22:00:49,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.381e+01 2.566e+01 2.831e+01 4.188e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-14 22:00:57,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2867720.0, ans=0.125 2024-08-14 22:01:15,430 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 22:01:31,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11450, loss[loss=0.06819, beats_loss=0.01153, ecapa_loss=0.0001544, whisper_loss=0.05511, over 13977.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01049, ecapa_loss=0.0001555, whisper_loss=0.09167, over 3890288.58 frames. ], batch size: 55, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:01:35,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2867920.0, ans=0.125 2024-08-14 22:01:40,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2867920.0, ans=0.125 2024-08-14 22:02:24,867 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 22:02:29,438 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-14 22:02:31,315 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:02:34,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-14 22:02:40,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2868320.0, ans=15.0 2024-08-14 22:02:53,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11500, loss[loss=0.1213, beats_loss=0.007884, ecapa_loss=0.0002233, whisper_loss=0.1111, over 21992.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.0001539, whisper_loss=0.09197, over 3878150.81 frames. ], batch size: 91, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:03:01,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-14 22:03:01,955 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-14 22:03:05,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2868420.0, ans=0.125 2024-08-14 22:03:10,239 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 22:03:24,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2868520.0, ans=0.1 2024-08-14 22:03:26,847 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-14 22:03:32,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-14 22:03:34,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.455e+01 2.723e+01 3.029e+01 4.016e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 22:03:35,904 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 38 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 22:03:39,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-08-14 22:03:43,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-14 22:03:44,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.63 vs. limit=6.0 2024-08-14 22:03:46,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2868720.0, ans=0.125 2024-08-14 22:03:58,706 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-14 22:04:01,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2868820.0, ans=0.125 2024-08-14 22:04:18,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11550, loss[loss=0.1225, beats_loss=0.006466, ecapa_loss=0.0001525, whisper_loss=0.1146, over 17572.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01049, ecapa_loss=0.000154, whisper_loss=0.09235, over 3864370.67 frames. ], batch size: 65, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:04:37,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2869020.0, ans=10.0 2024-08-14 22:04:38,736 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:04:41,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-14 22:04:44,684 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 22:04:45,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2869020.0, ans=0.125 2024-08-14 22:04:59,901 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-14 22:05:39,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11600, loss[loss=0.08999, beats_loss=0.009948, ecapa_loss=0.0001608, whisper_loss=0.07843, over 16250.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001545, whisper_loss=0.09129, over 3844058.04 frames. ], batch size: 67, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:05:49,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2869420.0, ans=0.0 2024-08-14 22:06:06,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2869520.0, ans=0.125 2024-08-14 22:06:19,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.332e+01 2.648e+01 3.162e+01 2.380e+02, threshold=5.297e+01, percent-clipped=2.0 2024-08-14 22:06:25,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2869620.0, ans=0.125 2024-08-14 22:06:38,781 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.288e-02 2024-08-14 22:06:45,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2869820.0, ans=0.1 2024-08-14 22:06:48,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2869820.0, ans=0.0 2024-08-14 22:07:00,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11650, loss[loss=0.1019, beats_loss=0.0113, ecapa_loss=0.0001391, whisper_loss=0.08921, over 21915.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001555, whisper_loss=0.09173, over 3831500.17 frames. ], batch size: 87, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:07:38,249 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 22:07:54,951 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-14 22:08:01,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2870220.0, ans=0.0 2024-08-14 22:08:18,085 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-14 22:08:23,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11700, loss[loss=0.09468, beats_loss=0.01218, ecapa_loss=0.0001412, whisper_loss=0.08109, over 15087.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001547, whisper_loss=0.09179, over 3847466.33 frames. ], batch size: 62, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:08:33,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2870420.0, ans=0.09899494936611666 2024-08-14 22:08:36,266 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-14 22:08:39,345 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-14 22:09:02,752 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-14 22:09:06,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.421e+01 2.717e+01 3.040e+01 4.718e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-14 22:09:20,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2870720.0, ans=0.125 2024-08-14 22:09:38,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2870820.0, ans=0.125 2024-08-14 22:09:45,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11750, loss[loss=0.1233, beats_loss=0.009259, ecapa_loss=0.0001666, whisper_loss=0.1124, over 23299.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001535, whisper_loss=0.09165, over 3861414.59 frames. ], batch size: 88, lr: 3.10e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:09:52,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2870920.0, ans=0.125 2024-08-14 22:09:58,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.88 vs. limit=15.0 2024-08-14 22:10:04,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2871020.0, ans=0.125 2024-08-14 22:10:14,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2871020.0, ans=0.125 2024-08-14 22:10:14,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2871020.0, ans=0.125 2024-08-14 22:10:20,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2024-08-14 22:10:22,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2871120.0, ans=0.95 2024-08-14 22:10:56,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2871220.0, ans=0.0 2024-08-14 22:11:03,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2871320.0, ans=0.0 2024-08-14 22:11:15,618 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 16 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-14 22:11:22,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11800, loss[loss=0.06366, beats_loss=0.009802, ecapa_loss=0.0001634, whisper_loss=0.05223, over 13861.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001542, whisper_loss=0.091, over 3874643.42 frames. ], batch size: 55, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:11:32,350 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:11:37,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2871420.0, ans=0.125 2024-08-14 22:11:38,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2871420.0, ans=0.125 2024-08-14 22:11:49,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2871520.0, ans=0.0 2024-08-14 22:12:03,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.341e+01 2.560e+01 2.807e+01 8.705e+01, threshold=5.119e+01, percent-clipped=2.0 2024-08-14 22:12:10,715 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 22:12:27,876 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-14 22:12:30,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.44 vs. limit=6.0 2024-08-14 22:12:55,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11850, loss[loss=0.08761, beats_loss=0.01052, ecapa_loss=0.0001641, whisper_loss=0.07544, over 14967.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.0001531, whisper_loss=0.09064, over 3883941.57 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:13:00,017 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:13:06,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2871920.0, ans=0.125 2024-08-14 22:13:20,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-14 22:13:27,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.44 vs. limit=22.5 2024-08-14 22:13:29,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872020.0, ans=0.1 2024-08-14 22:13:34,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2872020.0, ans=0.07 2024-08-14 22:13:49,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2872120.0, ans=15.0 2024-08-14 22:14:35,643 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-14 22:14:48,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11900, loss[loss=0.102, beats_loss=0.008552, ecapa_loss=0.0001798, whisper_loss=0.09165, over 19461.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01078, ecapa_loss=0.0001539, whisper_loss=0.09038, over 3911923.20 frames. ], batch size: 75, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:14:59,022 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 22:15:00,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-08-14 22:15:29,244 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-14 22:15:31,259 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 22:15:44,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.256e+01 2.473e+01 2.865e+01 1.430e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-14 22:15:58,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2872720.0, ans=0.0 2024-08-14 22:16:06,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2024-08-14 22:16:26,161 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 22:16:31,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2872820.0, ans=0.0 2024-08-14 22:16:32,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:34,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2872820.0, ans=0.125 2024-08-14 22:16:40,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 11950, loss[loss=0.1099, beats_loss=0.009486, ecapa_loss=0.0001896, whisper_loss=0.09852, over 18394.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001539, whisper_loss=0.09042, over 3899750.81 frames. ], batch size: 75, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:17:15,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2873020.0, ans=0.125 2024-08-14 22:17:22,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-14 22:17:36,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2024-08-14 22:17:39,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2873120.0, ans=0.125 2024-08-14 22:18:05,011 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 22:18:09,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2873320.0, ans=0.125 2024-08-14 22:18:10,592 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:18:21,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12000, loss[loss=0.09595, beats_loss=0.01165, ecapa_loss=0.0001508, whisper_loss=0.0828, over 22020.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.0001538, whisper_loss=0.09052, over 3878826.66 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:18:21,951 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 22:19:04,406 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005404, whisper_loss=0.2466, over 922467.00 frames. 2024-08-14 22:19:20,845 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on SV_voxceleb1: loss=0.004324, beats_loss=0, ecapa_loss=0.0004324, whisper_loss=0, over 939242.00 frames. 2024-08-14 22:21:26,145 INFO [train_multi_KD3.py:1149] (3/4) Epoch 20, validation on AT_audioset: loss=0.02348, beats_loss=0.02348, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 22:21:26,148 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 22:21:31,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2873420.0, ans=0.0 2024-08-14 22:21:34,067 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-14 22:21:40,304 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 22:22:03,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.280e+01 2.500e+01 2.714e+01 9.772e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-14 22:22:17,294 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-14 22:22:20,342 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-14 22:22:40,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2873920.0, ans=0.125 2024-08-14 22:22:41,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12050, loss[loss=0.09289, beats_loss=0.01129, ecapa_loss=0.0001555, whisper_loss=0.08004, over 17717.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01075, ecapa_loss=0.0001529, whisper_loss=0.08991, over 3861406.31 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:23:15,823 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:23:18,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=12.0 2024-08-14 22:23:19,779 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-14 22:23:20,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-08-14 22:23:23,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874120.0, ans=0.1 2024-08-14 22:23:23,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2024-08-14 22:23:24,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2024-08-14 22:23:40,973 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 22:23:46,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-08-14 22:23:48,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874320.0, ans=0.1 2024-08-14 22:23:56,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2874420.0, ans=0.2 2024-08-14 22:23:56,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2874420.0, ans=0.0 2024-08-14 22:23:56,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12100, loss[loss=0.06986, beats_loss=0.01397, ecapa_loss=0.0001236, whisper_loss=0.05466, over 15940.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001549, whisper_loss=0.08977, over 3855104.03 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:24:02,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-14 22:24:14,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2874520.0, ans=0.05 2024-08-14 22:24:28,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2874620.0, ans=0.125 2024-08-14 22:24:35,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2874620.0, ans=0.0 2024-08-14 22:24:35,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.418e+01 2.574e+01 2.910e+01 4.724e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-14 22:24:36,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-08-14 22:24:37,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2874620.0, ans=0.125 2024-08-14 22:24:39,114 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 22:24:46,575 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 22:24:51,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.64 vs. limit=22.5 2024-08-14 22:25:13,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12150, loss[loss=0.05772, beats_loss=0.01317, ecapa_loss=0.0001605, whisper_loss=0.04294, over 14635.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01067, ecapa_loss=0.000155, whisper_loss=0.08917, over 3819208.07 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:25:39,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.27 vs. limit=10.0 2024-08-14 22:25:45,002 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-14 22:25:48,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2875120.0, ans=0.125 2024-08-14 22:25:50,496 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 22:26:02,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=12.0 2024-08-14 22:26:11,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2875220.0, ans=0.125 2024-08-14 22:26:15,164 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-14 22:26:17,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-14 22:26:28,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2024-08-14 22:26:28,840 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12200, loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001805, whisper_loss=0.09124, over 21783.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001547, whisper_loss=0.09009, over 3863579.56 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:26:34,102 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:26:54,809 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 22:26:58,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2875620.0, ans=0.0 2024-08-14 22:27:06,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.621e+01 2.982e+01 1.533e+02, threshold=5.242e+01, percent-clipped=1.0 2024-08-14 22:27:10,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2875620.0, ans=0.125 2024-08-14 22:27:21,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875720.0, ans=0.1 2024-08-14 22:27:30,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2875820.0, ans=0.0 2024-08-14 22:27:35,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2875820.0, ans=0.125 2024-08-14 22:27:45,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12250, loss[loss=0.08036, beats_loss=0.01211, ecapa_loss=0.0001451, whisper_loss=0.0668, over 19410.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001544, whisper_loss=0.09002, over 3870018.20 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:27:47,757 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 22:27:53,452 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-14 22:28:15,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2876120.0, ans=0.0 2024-08-14 22:28:29,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2876120.0, ans=0.05 2024-08-14 22:28:32,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2876220.0, ans=0.125 2024-08-14 22:28:36,797 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 22:28:38,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2876220.0, ans=0.0 2024-08-14 22:29:02,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12300, loss[loss=0.0935, beats_loss=0.009958, ecapa_loss=0.0001724, whisper_loss=0.08181, over 18478.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001562, whisper_loss=0.08977, over 3865266.87 frames. ], batch size: 79, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:29:07,461 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-14 22:29:14,753 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-14 22:29:19,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=12.0 2024-08-14 22:29:22,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2876520.0, ans=0.125 2024-08-14 22:29:29,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2876520.0, ans=0.125 2024-08-14 22:29:39,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.269e+01 2.570e+01 2.894e+01 3.715e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-14 22:29:55,361 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 22:29:55,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2876720.0, ans=0.125 2024-08-14 22:29:59,389 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-14 22:30:02,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2876820.0, ans=0.125 2024-08-14 22:30:04,156 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-14 22:30:05,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2876820.0, ans=0.125 2024-08-14 22:30:17,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12350, loss[loss=0.102, beats_loss=0.008641, ecapa_loss=0.0001791, whisper_loss=0.09158, over 19086.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0106, ecapa_loss=0.0001569, whisper_loss=0.09007, over 3870510.42 frames. ], batch size: 79, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:30:26,276 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 22:30:27,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-08-14 22:30:39,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2877020.0, ans=0.035 2024-08-14 22:30:46,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2877020.0, ans=0.0 2024-08-14 22:31:02,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2877120.0, ans=10.0 2024-08-14 22:31:11,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2877220.0, ans=0.05 2024-08-14 22:31:13,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2877220.0, ans=0.125 2024-08-14 22:31:28,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=22.5 2024-08-14 22:31:34,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2877420.0, ans=0.125 2024-08-14 22:31:34,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=10.0 2024-08-14 22:31:34,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12400, loss[loss=0.08212, beats_loss=0.01211, ecapa_loss=0.0001813, whisper_loss=0.0682, over 21966.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001567, whisper_loss=0.09, over 3880361.80 frames. ], batch size: 92, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:31:43,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-08-14 22:31:50,141 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-14 22:31:53,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2877520.0, ans=0.2 2024-08-14 22:32:00,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2877520.0, ans=0.125 2024-08-14 22:32:03,084 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-14 22:32:03,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-14 22:32:07,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2877620.0, ans=0.125 2024-08-14 22:32:12,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-08-14 22:32:13,140 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.352e+01 2.633e+01 2.974e+01 1.809e+02, threshold=5.265e+01, percent-clipped=2.0 2024-08-14 22:32:33,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-14 22:32:36,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2877820.0, ans=0.1 2024-08-14 22:32:44,459 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-14 22:32:49,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12450, loss[loss=0.1067, beats_loss=0.009025, ecapa_loss=0.0001361, whisper_loss=0.09636, over 17159.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001564, whisper_loss=0.08989, over 3879557.28 frames. ], batch size: 67, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:33:03,552 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-14 22:33:06,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878020.0, ans=0.1 2024-08-14 22:33:08,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2878020.0, ans=0.125 2024-08-14 22:33:11,019 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-14 22:33:21,721 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-14 22:33:22,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2878120.0, ans=0.07 2024-08-14 22:33:33,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2878220.0, ans=0.125 2024-08-14 22:33:39,451 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-14 22:33:39,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2878220.0, ans=0.125 2024-08-14 22:34:04,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12500, loss[loss=0.1043, beats_loss=0.009748, ecapa_loss=0.0001656, whisper_loss=0.09291, over 17631.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001544, whisper_loss=0.09012, over 3872330.29 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:34:19,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2878520.0, ans=0.125 2024-08-14 22:34:25,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2878520.0, ans=0.125 2024-08-14 22:34:27,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2878520.0, ans=0.1 2024-08-14 22:34:29,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878520.0, ans=0.1 2024-08-14 22:34:43,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.340e+01 2.618e+01 2.965e+01 2.177e+02, threshold=5.235e+01, percent-clipped=3.0 2024-08-14 22:34:46,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2878620.0, ans=0.2 2024-08-14 22:35:21,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12550, loss[loss=0.07971, beats_loss=0.01009, ecapa_loss=0.0002059, whisper_loss=0.06756, over 21995.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001545, whisper_loss=0.09011, over 3844661.87 frames. ], batch size: 95, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:35:29,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2878920.0, ans=0.125 2024-08-14 22:35:47,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2879020.0, ans=0.1 2024-08-14 22:35:52,922 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-14 22:35:54,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2879120.0, ans=0.1 2024-08-14 22:36:02,462 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 22:36:35,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12600, loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001759, whisper_loss=0.0896, over 19507.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001547, whisper_loss=0.09004, over 3866130.96 frames. ], batch size: 81, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:36:58,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2879520.0, ans=0.125 2024-08-14 22:37:10,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-14 22:37:13,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.404e+01 2.680e+01 3.035e+01 5.751e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-14 22:37:36,840 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 16 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-14 22:37:51,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12650, loss[loss=0.1007, beats_loss=0.01321, ecapa_loss=0.0001575, whisper_loss=0.08593, over 21260.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001541, whisper_loss=0.09115, over 3876080.90 frames. ], batch size: 88, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:38:01,993 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-14 22:38:31,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2880120.0, ans=0.125 2024-08-14 22:38:37,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2880120.0, ans=0.0 2024-08-14 22:38:56,262 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-14 22:38:59,799 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 22:39:05,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880320.0, ans=0.1 2024-08-14 22:39:09,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12700, loss[loss=0.1152, beats_loss=0.01129, ecapa_loss=0.000138, whisper_loss=0.1025, over 23462.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001534, whisper_loss=0.09204, over 3907157.51 frames. ], batch size: 93, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:39:16,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2880420.0, ans=0.0 2024-08-14 22:39:21,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-14 22:39:22,344 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 22:39:25,223 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-14 22:39:35,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2880520.0, ans=0.125 2024-08-14 22:39:48,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.668e+01 2.293e+01 2.553e+01 2.866e+01 4.469e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-14 22:39:59,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2880720.0, ans=0.125 2024-08-14 22:40:28,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12750, loss[loss=0.109, beats_loss=0.01151, ecapa_loss=0.0001362, whisper_loss=0.09614, over 20129.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01076, ecapa_loss=0.000153, whisper_loss=0.09159, over 3912711.30 frames. ], batch size: 79, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:40:28,968 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:40:33,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2880920.0, ans=0.2 2024-08-14 22:40:34,501 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-14 22:40:37,846 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 41 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-14 22:41:09,658 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-14 22:41:17,806 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 22:41:30,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-14 22:41:47,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12800, loss[loss=0.1105, beats_loss=0.01246, ecapa_loss=0.0001066, whisper_loss=0.09692, over 17311.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001529, whisper_loss=0.09154, over 3884249.47 frames. ], batch size: 64, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:42:04,478 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 22:42:06,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2881520.0, ans=0.125 2024-08-14 22:42:11,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881520.0, ans=0.1 2024-08-14 22:42:19,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-08-14 22:42:21,383 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-14 22:42:27,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.348e+01 2.569e+01 2.991e+01 4.323e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-14 22:42:35,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-14 22:42:37,425 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 22:42:39,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2881720.0, ans=0.0 2024-08-14 22:42:39,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.83 vs. limit=10.0 2024-08-14 22:42:43,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-14 22:42:45,431 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 36 from Vox, 29 fro AS 2024-08-14 22:42:51,391 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-14 22:42:54,689 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-14 22:43:00,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2881820.0, ans=0.2 2024-08-14 22:43:07,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12850, loss[loss=0.07141, beats_loss=0.01036, ecapa_loss=0.0001822, whisper_loss=0.05923, over 15390.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01071, ecapa_loss=0.0001548, whisper_loss=0.09126, over 3846455.80 frames. ], batch size: 62, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:43:12,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2881920.0, ans=0.125 2024-08-14 22:43:17,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-14 22:43:25,099 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 22:43:26,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2882020.0, ans=0.0 2024-08-14 22:43:31,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-14 22:43:33,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2882020.0, ans=0.125 2024-08-14 22:43:38,852 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-14 22:43:59,779 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-14 22:43:59,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2882220.0, ans=0.2 2024-08-14 22:44:01,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882220.0, ans=0.1 2024-08-14 22:44:03,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2882220.0, ans=0.0 2024-08-14 22:44:09,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2882320.0, ans=0.0 2024-08-14 22:44:26,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12900, loss[loss=0.07447, beats_loss=0.01202, ecapa_loss=0.0001543, whisper_loss=0.06091, over 14351.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001538, whisper_loss=0.09095, over 3818455.32 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:44:31,604 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 22:44:35,207 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-14 22:44:35,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2882420.0, ans=0.2 2024-08-14 22:44:35,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2882420.0, ans=0.125 2024-08-14 22:45:02,256 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-14 22:45:02,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2882620.0, ans=0.125 2024-08-14 22:45:06,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.541e+01 3.103e+01 4.872e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-14 22:45:22,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882720.0, ans=0.1 2024-08-14 22:45:27,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2882720.0, ans=0.0 2024-08-14 22:45:39,265 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 22:45:41,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2882820.0, ans=0.125 2024-08-14 22:45:46,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 12950, loss[loss=0.1019, beats_loss=0.01076, ecapa_loss=0.0001471, whisper_loss=0.08966, over 13972.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001528, whisper_loss=0.09082, over 3810502.07 frames. ], batch size: 54, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:46:04,709 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-14 22:46:08,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2883020.0, ans=0.0 2024-08-14 22:46:09,187 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-14 22:46:18,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2883120.0, ans=0.2 2024-08-14 22:46:30,752 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-14 22:46:45,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-14 22:47:05,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13000, loss[loss=0.08547, beats_loss=0.01232, ecapa_loss=0.000177, whisper_loss=0.07138, over 19944.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001533, whisper_loss=0.08972, over 3812218.14 frames. ], batch size: 85, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:47:10,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2024-08-14 22:47:11,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2883420.0, ans=0.025 2024-08-14 22:47:24,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883520.0, ans=0.1 2024-08-14 22:47:40,415 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-14 22:47:44,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.265e+01 2.555e+01 2.965e+01 9.531e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-14 22:48:18,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2883820.0, ans=0.0 2024-08-14 22:48:23,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13050, loss[loss=0.102, beats_loss=0.01186, ecapa_loss=0.0001143, whisper_loss=0.08898, over 17837.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001542, whisper_loss=0.08952, over 3826659.20 frames. ], batch size: 68, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:48:41,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2884020.0, ans=0.125 2024-08-14 22:49:12,690 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-14 22:49:23,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2884320.0, ans=0.1 2024-08-14 22:49:39,055 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13100, loss[loss=0.1255, beats_loss=0.00871, ecapa_loss=0.0001741, whisper_loss=0.115, over 22321.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0107, ecapa_loss=0.0001533, whisper_loss=0.09075, over 3838061.97 frames. ], batch size: 89, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:49:56,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884520.0, ans=0.1 2024-08-14 22:49:57,335 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 22:50:00,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2884520.0, ans=0.1 2024-08-14 22:50:06,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2024-08-14 22:50:14,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-14 22:50:17,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.252e+01 2.466e+01 2.762e+01 3.730e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-14 22:50:20,933 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-14 22:50:26,865 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-14 22:50:30,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-14 22:50:31,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2024-08-14 22:50:40,742 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-14 22:50:55,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13150, loss[loss=0.1233, beats_loss=0.008108, ecapa_loss=0.0001471, whisper_loss=0.1137, over 21581.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001528, whisper_loss=0.09051, over 3840525.37 frames. ], batch size: 84, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:51:00,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884920.0, ans=0.1 2024-08-14 22:51:04,483 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-14 22:51:08,850 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-14 22:51:17,725 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-14 22:51:30,723 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-14 22:51:38,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2885120.0, ans=0.1 2024-08-14 22:51:39,462 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 14 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-14 22:51:44,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885220.0, ans=0.1 2024-08-14 22:52:01,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-14 22:52:11,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13200, loss[loss=0.08848, beats_loss=0.01207, ecapa_loss=0.0001669, whisper_loss=0.07475, over 16747.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001531, whisper_loss=0.09007, over 3834405.83 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:52:22,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2885420.0, ans=0.1 2024-08-14 22:52:25,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-14 22:52:37,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2885520.0, ans=0.0 2024-08-14 22:52:49,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.330e+01 2.568e+01 2.844e+01 4.836e+01, threshold=5.136e+01, percent-clipped=0.0 2024-08-14 22:52:52,934 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-14 22:52:54,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2885620.0, ans=0.125 2024-08-14 22:53:00,776 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 22:53:05,001 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-14 22:53:06,661 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-14 22:53:17,375 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 7 from Vox, 46 fro AS 2024-08-14 22:53:27,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13250, loss[loss=0.1143, beats_loss=0.007189, ecapa_loss=0.0001471, whisper_loss=0.1057, over 16622.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001536, whisper_loss=0.09079, over 3855849.76 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:53:40,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=2885920.0, ans=12.0 2024-08-14 22:53:55,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2886020.0, ans=0.125 2024-08-14 22:54:19,179 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-14 22:54:29,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2886320.0, ans=0.125 2024-08-14 22:54:36,578 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-14 22:54:42,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-14 22:54:42,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13300, loss[loss=0.09026, beats_loss=0.01058, ecapa_loss=0.0001742, whisper_loss=0.07793, over 17738.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001536, whisper_loss=0.09098, over 3845368.46 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:54:53,344 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 22:55:19,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2886620.0, ans=0.125 2024-08-14 22:55:20,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.690e+01 2.319e+01 2.603e+01 2.951e+01 5.075e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-14 22:55:27,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2886720.0, ans=0.1 2024-08-14 22:55:39,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2886720.0, ans=0.125 2024-08-14 22:55:58,363 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13350, loss[loss=0.1242, beats_loss=0.007769, ecapa_loss=0.0001517, whisper_loss=0.1149, over 15874.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001539, whisper_loss=0.09116, over 3855095.02 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 5.764607523034235e+17 2024-08-14 22:56:18,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-14 22:56:18,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2887020.0, ans=0.2 2024-08-14 22:56:21,580 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-14 22:56:23,047 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 22:56:24,493 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 22:56:32,051 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-14 22:56:43,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2887220.0, ans=0.125 2024-08-14 22:56:47,140 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-14 22:56:50,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2887220.0, ans=0.025 2024-08-14 22:56:51,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2887220.0, ans=0.0 2024-08-14 22:56:53,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-14 22:57:06,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-14 22:57:13,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13400, loss[loss=0.09314, beats_loss=0.01143, ecapa_loss=0.0001436, whisper_loss=0.08027, over 14884.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001533, whisper_loss=0.09122, over 3876044.11 frames. ], batch size: 59, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:57:15,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2887420.0, ans=0.5 2024-08-14 22:57:17,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2887420.0, ans=0.5 2024-08-14 22:57:20,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=12.0 2024-08-14 22:57:42,618 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-14 22:57:50,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.321e+01 2.514e+01 2.715e+01 4.037e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-14 22:57:53,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2887620.0, ans=0.125 2024-08-14 22:57:55,960 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 22:57:58,823 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-14 22:58:00,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887720.0, ans=0.1 2024-08-14 22:58:01,772 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 35 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-14 22:58:06,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2887720.0, ans=0.2 2024-08-14 22:58:08,953 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 7 from Vox, 26 fro AS 2024-08-14 22:58:12,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2887820.0, ans=0.125 2024-08-14 22:58:28,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13450, loss[loss=0.1298, beats_loss=0.01008, ecapa_loss=0.0001548, whisper_loss=0.1182, over 23045.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001536, whisper_loss=0.09099, over 3850551.91 frames. ], batch size: 90, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:58:31,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2887920.0, ans=0.125 2024-08-14 22:58:36,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887920.0, ans=0.1 2024-08-14 22:58:47,390 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 22:58:50,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2888020.0, ans=0.1 2024-08-14 22:59:40,569 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13500, loss[loss=0.08901, beats_loss=0.01409, ecapa_loss=0.000154, whisper_loss=0.07338, over 21964.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001536, whisper_loss=0.09039, over 3863995.70 frames. ], batch size: 94, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 22:59:49,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2888420.0, ans=0.05 2024-08-14 23:00:16,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888620.0, ans=0.125 2024-08-14 23:00:18,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.433e+01 2.626e+01 2.844e+01 4.134e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-14 23:00:21,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2888620.0, ans=0.125 2024-08-14 23:00:24,058 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-14 23:00:46,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2888820.0, ans=0.125 2024-08-14 23:00:48,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2888820.0, ans=0.125 2024-08-14 23:00:56,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13550, loss[loss=0.1067, beats_loss=0.01288, ecapa_loss=0.0001026, whisper_loss=0.09284, over 19885.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001526, whisper_loss=0.09035, over 3868695.12 frames. ], batch size: 76, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:01:03,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2888920.0, ans=0.0 2024-08-14 23:01:17,299 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 23:01:28,895 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-14 23:01:33,174 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-14 23:01:36,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=2889120.0, ans=0.1 2024-08-14 23:01:56,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2889320.0, ans=0.0 2024-08-14 23:02:02,433 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-14 23:02:02,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2889320.0, ans=0.2 2024-08-14 23:02:03,721 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-14 23:02:09,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13600, loss[loss=0.07724, beats_loss=0.011, ecapa_loss=0.0001554, whisper_loss=0.06468, over 17927.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001515, whisper_loss=0.08992, over 3873401.28 frames. ], batch size: 76, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:02:29,735 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-14 23:02:29,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2889520.0, ans=0.125 2024-08-14 23:02:35,525 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-14 23:02:45,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.266e+01 2.507e+01 2.843e+01 1.129e+02, threshold=5.014e+01, percent-clipped=1.0 2024-08-14 23:02:55,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-14 23:02:59,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2889720.0, ans=0.0 2024-08-14 23:02:59,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2889720.0, ans=0.0 2024-08-14 23:03:03,993 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-14 23:03:05,274 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-14 23:03:11,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-14 23:03:13,782 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-14 23:03:20,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-14 23:03:22,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13650, loss[loss=0.0992, beats_loss=0.008594, ecapa_loss=0.000164, whisper_loss=0.08897, over 15856.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01083, ecapa_loss=0.0001518, whisper_loss=0.08936, over 3866091.60 frames. ], batch size: 61, lr: 3.09e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:03:24,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2889920.0, ans=0.125 2024-08-14 23:03:31,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2889920.0, ans=0.1 2024-08-14 23:03:38,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2890020.0, ans=0.125 2024-08-14 23:03:41,009 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-14 23:03:50,019 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-14 23:03:56,130 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-14 23:04:17,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2890220.0, ans=0.025 2024-08-14 23:04:18,576 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-14 23:04:19,900 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 10 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-14 23:04:23,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-14 23:04:26,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2890320.0, ans=0.0 2024-08-14 23:04:31,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2890320.0, ans=0.0 2024-08-14 23:04:37,804 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13700, loss[loss=0.1022, beats_loss=0.00935, ecapa_loss=0.000144, whisper_loss=0.09137, over 16700.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01083, ecapa_loss=0.0001514, whisper_loss=0.08962, over 3874294.25 frames. ], batch size: 62, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:04:41,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2890420.0, ans=0.125 2024-08-14 23:04:50,159 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-14 23:05:13,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2890620.0, ans=0.0 2024-08-14 23:05:15,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.409e+01 2.626e+01 3.025e+01 7.983e+01, threshold=5.251e+01, percent-clipped=2.0 2024-08-14 23:05:42,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2890820.0, ans=0.125 2024-08-14 23:05:46,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2024-08-14 23:05:51,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2890920.0, ans=0.1 2024-08-14 23:05:52,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13750, loss[loss=0.1091, beats_loss=0.006358, ecapa_loss=0.0001861, whisper_loss=0.1009, over 19240.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001523, whisper_loss=0.08981, over 3840644.62 frames. ], batch size: 72, lr: 3.08e-03, grad_scale: 1.152921504606847e+18 2024-08-14 23:05:56,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2890920.0, ans=0.125 2024-08-14 23:05:57,346 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 27 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-14 23:06:10,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-14 23:06:11,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2891020.0, ans=0.125 2024-08-14 23:06:28,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2891120.0, ans=0.2 2024-08-14 23:06:28,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2891120.0, ans=0.125 2024-08-14 23:06:31,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2891120.0, ans=0.125 2024-08-14 23:06:44,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2891220.0, ans=0.125 2024-08-14 23:06:53,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2891320.0, ans=0.0 2024-08-14 23:07:07,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13800, loss[loss=0.09011, beats_loss=0.01112, ecapa_loss=0.0001723, whisper_loss=0.07727, over 18853.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.0001524, whisper_loss=0.09034, over 3866454.22 frames. ], batch size: 77, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:07:12,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-14 23:07:33,759 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-14 23:07:38,300 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 23:07:42,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2891620.0, ans=0.2 2024-08-14 23:07:43,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-14 23:07:43,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2024-08-14 23:07:44,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2891620.0, ans=0.0 2024-08-14 23:07:46,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.261e+01 2.530e+01 3.041e+01 4.646e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-14 23:07:53,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-14 23:08:22,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13850, loss[loss=0.1261, beats_loss=0.009867, ecapa_loss=0.0001279, whisper_loss=0.1149, over 24397.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001518, whisper_loss=0.09047, over 3864192.21 frames. ], batch size: 93, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:08:29,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2891920.0, ans=0.125 2024-08-14 23:08:33,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2891920.0, ans=0.125 2024-08-14 23:08:50,785 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-14 23:09:06,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2892120.0, ans=0.125 2024-08-14 23:09:18,610 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-14 23:09:20,295 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-14 23:09:24,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2892320.0, ans=0.0 2024-08-14 23:09:25,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2892320.0, ans=15.0 2024-08-14 23:09:40,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13900, loss[loss=0.1084, beats_loss=0.01056, ecapa_loss=0.0001598, whisper_loss=0.09625, over 18177.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001515, whisper_loss=0.09082, over 3882286.37 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:09:59,057 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-14 23:10:06,349 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-14 23:10:14,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2892620.0, ans=0.02 2024-08-14 23:10:19,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2892620.0, ans=0.125 2024-08-14 23:10:20,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.401e+01 2.644e+01 3.017e+01 4.177e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-14 23:10:22,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2892620.0, ans=10.0 2024-08-14 23:10:37,892 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 23:10:40,966 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-14 23:10:41,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2892820.0, ans=0.0 2024-08-14 23:10:49,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2892820.0, ans=0.125 2024-08-14 23:10:56,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 13950, loss[loss=0.1101, beats_loss=0.007981, ecapa_loss=0.0001842, whisper_loss=0.1003, over 15984.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001512, whisper_loss=0.09119, over 3852639.39 frames. ], batch size: 65, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:10:56,342 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 23:10:58,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2892920.0, ans=0.0 2024-08-14 23:10:58,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2892920.0, ans=0.125 2024-08-14 23:11:08,304 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 18 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-14 23:11:13,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2893020.0, ans=0.125 2024-08-14 23:11:16,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893020.0, ans=0.1 2024-08-14 23:11:24,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-14 23:11:43,212 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-14 23:12:02,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-14 23:12:09,952 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-14 23:12:11,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14000, loss[loss=0.1009, beats_loss=0.009626, ecapa_loss=0.0001454, whisper_loss=0.08983, over 20758.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01067, ecapa_loss=0.0001502, whisper_loss=0.0913, over 3868217.17 frames. ], batch size: 79, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:12:13,370 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-14 23:12:37,707 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-14 23:12:50,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.312e+01 2.545e+01 2.868e+01 4.909e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-14 23:13:11,389 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 23:13:11,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2893820.0, ans=0.2 2024-08-14 23:13:16,064 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-14 23:13:23,380 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 23:13:26,002 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-14 23:13:26,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2893920.0, ans=0.125 2024-08-14 23:13:27,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14050, loss[loss=0.1206, beats_loss=0.009922, ecapa_loss=0.0001383, whisper_loss=0.1093, over 14710.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09212, over 3831316.44 frames. ], batch size: 54, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:13:34,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-08-14 23:13:36,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2893920.0, ans=0.1 2024-08-14 23:13:40,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2893920.0, ans=0.125 2024-08-14 23:13:52,799 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-14 23:13:59,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=12.0 2024-08-14 23:13:59,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2894120.0, ans=6.0 2024-08-14 23:14:33,536 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-14 23:14:42,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14100, loss[loss=0.1101, beats_loss=0.0104, ecapa_loss=0.0001306, whisper_loss=0.09838, over 21505.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01061, ecapa_loss=0.0001502, whisper_loss=0.09206, over 3830943.66 frames. ], batch size: 85, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:14:44,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-14 23:14:51,627 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-14 23:14:53,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2024-08-14 23:15:00,535 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-14 23:15:21,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.405e+01 2.712e+01 3.125e+01 2.483e+02, threshold=5.424e+01, percent-clipped=1.0 2024-08-14 23:15:23,269 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-14 23:15:26,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2894720.0, ans=0.125 2024-08-14 23:15:43,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2894820.0, ans=0.125 2024-08-14 23:15:47,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2894820.0, ans=0.125 2024-08-14 23:15:54,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2894820.0, ans=0.125 2024-08-14 23:15:56,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2894920.0, ans=0.125 2024-08-14 23:15:57,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14150, loss[loss=0.1256, beats_loss=0.009024, ecapa_loss=0.0001494, whisper_loss=0.1151, over 22904.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01058, ecapa_loss=0.0001508, whisper_loss=0.09224, over 3851399.91 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:16:11,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2895020.0, ans=0.125 2024-08-14 23:16:18,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2895020.0, ans=15.0 2024-08-14 23:16:31,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2024-08-14 23:16:50,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2895220.0, ans=0.125 2024-08-14 23:16:56,460 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-14 23:17:12,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14200, loss[loss=0.0897, beats_loss=0.01385, ecapa_loss=0.000113, whisper_loss=0.07472, over 22327.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001502, whisper_loss=0.09167, over 3859327.59 frames. ], batch size: 90, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:17:26,545 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 23:17:37,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2895520.0, ans=0.125 2024-08-14 23:17:38,888 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-14 23:17:52,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.327e+01 2.651e+01 3.057e+01 2.487e+02, threshold=5.302e+01, percent-clipped=2.0 2024-08-14 23:18:28,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14250, loss[loss=0.1127, beats_loss=0.01138, ecapa_loss=0.0001356, whisper_loss=0.09999, over 22001.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0106, ecapa_loss=0.0001508, whisper_loss=0.09181, over 3890503.79 frames. ], batch size: 87, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:11,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2896120.0, ans=0.95 2024-08-14 23:19:35,264 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-14 23:19:40,897 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-14 23:19:45,090 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14300, loss[loss=0.1243, beats_loss=0.009687, ecapa_loss=0.0001347, whisper_loss=0.1133, over 23268.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001505, whisper_loss=0.0917, over 3897800.32 frames. ], batch size: 86, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:19:59,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2896520.0, ans=0.0 2024-08-14 23:20:06,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.82 vs. limit=22.5 2024-08-14 23:20:12,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896520.0, ans=0.1 2024-08-14 23:20:19,504 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-14 23:20:19,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2896620.0, ans=0.125 2024-08-14 23:20:22,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2896620.0, ans=0.0 2024-08-14 23:20:22,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2896620.0, ans=0.0 2024-08-14 23:20:23,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.328e+01 2.532e+01 2.890e+01 9.959e+01, threshold=5.063e+01, percent-clipped=3.0 2024-08-14 23:20:42,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-08-14 23:20:58,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14350, loss[loss=0.1243, beats_loss=0.009518, ecapa_loss=0.0001544, whisper_loss=0.1132, over 24510.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001499, whisper_loss=0.09147, over 3906519.49 frames. ], batch size: 94, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:21:04,772 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-14 23:21:05,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2896920.0, ans=0.2 2024-08-14 23:21:10,653 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-14 23:21:18,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2897020.0, ans=0.5 2024-08-14 23:21:29,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2897120.0, ans=0.125 2024-08-14 23:21:34,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2897120.0, ans=0.0 2024-08-14 23:21:39,989 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:21:50,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2897220.0, ans=0.1 2024-08-14 23:21:58,768 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-14 23:22:00,176 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-14 23:22:11,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14400, loss[loss=0.1036, beats_loss=0.01176, ecapa_loss=0.0001572, whisper_loss=0.09024, over 22459.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001517, whisper_loss=0.09172, over 3898882.06 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:22:11,883 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-14 23:22:19,336 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-14 23:22:29,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2897520.0, ans=0.0 2024-08-14 23:22:42,786 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-14 23:22:46,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2897620.0, ans=0.1 2024-08-14 23:22:51,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.446e+01 2.693e+01 3.071e+01 5.241e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-14 23:23:09,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-08-14 23:23:10,306 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 23:23:12,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-14 23:23:20,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2897820.0, ans=0.125 2024-08-14 23:23:25,820 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-14 23:23:26,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 20, batch 14450, loss[loss=0.1021, beats_loss=0.009478, ecapa_loss=0.0001458, whisper_loss=0.09121, over 22136.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001524, whisper_loss=0.09111, over 3890252.39 frames. ], batch size: 87, lr: 3.08e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:23:34,435 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-14 23:23:43,762 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-14 23:23:52,213 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-14 23:23:52,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2898020.0, ans=0.1 2024-08-14 23:24:02,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-08-14 23:24:21,944 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-14 23:24:25,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2898320.0, ans=0.2 2024-08-14 23:25:02,514 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 0, loss[loss=0.08956, beats_loss=0.009436, ecapa_loss=0.0002176, whisper_loss=0.07795, over 18592.00 frames. ], tot_loss[loss=0.08956, beats_loss=0.009436, ecapa_loss=0.0002176, whisper_loss=0.07795, over 18592.00 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:25:02,515 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-14 23:25:46,224 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005489, whisper_loss=0.2481, over 922467.00 frames. 2024-08-14 23:26:02,325 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on SV_voxceleb1: loss=0.004256, beats_loss=0, ecapa_loss=0.0004256, whisper_loss=0, over 939242.00 frames. 2024-08-14 23:28:02,898 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on AT_audioset: loss=0.02343, beats_loss=0.02343, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-14 23:28:02,901 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-14 23:28:04,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2898350.0, ans=0.1 2024-08-14 23:28:04,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2898350.0, ans=0.125 2024-08-14 23:28:19,018 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-14 23:28:25,226 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-14 23:28:36,630 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-14 23:28:46,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2898450.0, ans=15.0 2024-08-14 23:28:53,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2898450.0, ans=0.0 2024-08-14 23:28:58,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2898550.0, ans=0.0 2024-08-14 23:28:58,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2898550.0, ans=0.125 2024-08-14 23:29:30,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.477e+01 2.723e+01 3.011e+01 4.734e+01, threshold=5.445e+01, percent-clipped=0.0 2024-08-14 23:29:57,290 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-14 23:30:01,633 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-14 23:30:13,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 50, loss[loss=0.09626, beats_loss=0.01074, ecapa_loss=0.0001423, whisper_loss=0.0841, over 23041.00 frames. ], tot_loss[loss=0.09884, beats_loss=0.009803, ecapa_loss=0.0001578, whisper_loss=0.08746, over 843611.84 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:30:13,915 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-14 23:30:35,233 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:30:57,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2898950.0, ans=0.1 2024-08-14 23:31:23,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2899050.0, ans=0.0 2024-08-14 23:31:45,900 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-14 23:31:57,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-08-14 23:32:13,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 100, loss[loss=0.09368, beats_loss=0.007506, ecapa_loss=0.000205, whisper_loss=0.08412, over 16585.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009509, ecapa_loss=0.0001562, whisper_loss=0.09105, over 1527609.55 frames. ], batch size: 65, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:32:17,028 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-14 23:33:14,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2899550.0, ans=0.125 2024-08-14 23:33:31,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.639e+01 2.885e+01 3.247e+01 3.567e+02, threshold=5.770e+01, percent-clipped=1.0 2024-08-14 23:34:07,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2899850.0, ans=0.125 2024-08-14 23:34:07,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 150, loss[loss=0.08753, beats_loss=0.009352, ecapa_loss=0.0001708, whisper_loss=0.07647, over 16841.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.009483, ecapa_loss=0.0001561, whisper_loss=0.09171, over 2034961.10 frames. ], batch size: 65, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:34:15,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2899850.0, ans=0.02 2024-08-14 23:34:35,184 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-14 23:34:41,745 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:34:51,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=12.0 2024-08-14 23:34:51,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2024-08-14 23:34:56,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2024-08-14 23:35:01,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2900150.0, ans=0.125 2024-08-14 23:35:05,693 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-14 23:35:15,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2900250.0, ans=0.0 2024-08-14 23:35:24,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2900250.0, ans=0.125 2024-08-14 23:35:31,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 200, loss[loss=0.09395, beats_loss=0.01088, ecapa_loss=0.0001402, whisper_loss=0.08167, over 18780.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009882, ecapa_loss=0.0001548, whisper_loss=0.08993, over 2451929.18 frames. ], batch size: 72, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:35:34,846 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.261e-02 2024-08-14 23:35:34,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2900350.0, ans=0.125 2024-08-14 23:35:36,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2900350.0, ans=10.0 2024-08-14 23:35:37,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-14 23:35:43,249 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-14 23:35:50,477 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-14 23:35:53,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2900450.0, ans=0.0 2024-08-14 23:36:01,081 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-14 23:36:22,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-14 23:36:23,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900650.0, ans=0.1 2024-08-14 23:36:23,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.304e+01 2.562e+01 2.943e+01 6.143e+01, threshold=5.124e+01, percent-clipped=1.0 2024-08-14 23:36:36,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2900750.0, ans=0.0 2024-08-14 23:36:45,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2900750.0, ans=0.0 2024-08-14 23:36:49,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 250, loss[loss=0.07973, beats_loss=0.01354, ecapa_loss=0.0001257, whisper_loss=0.06493, over 20639.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0101, ecapa_loss=0.0001542, whisper_loss=0.08941, over 2756446.21 frames. ], batch size: 84, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:37:23,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2901050.0, ans=0.07 2024-08-14 23:37:34,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-08-14 23:37:35,975 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-14 23:37:37,372 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 13 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-14 23:37:45,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2901150.0, ans=0.125 2024-08-14 23:37:46,420 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 23:38:01,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 300, loss[loss=0.08872, beats_loss=0.01103, ecapa_loss=0.0001562, whisper_loss=0.07612, over 16089.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001547, whisper_loss=0.08998, over 2996741.01 frames. ], batch size: 63, lr: 3.00e-03, grad_scale: 5.764607523034235e+17 2024-08-14 23:38:16,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2901450.0, ans=0.1 2024-08-14 23:38:24,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2901450.0, ans=0.125 2024-08-14 23:38:26,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2901450.0, ans=0.0 2024-08-14 23:38:39,292 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-14 23:38:49,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.216e+01 2.512e+01 2.821e+01 4.988e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-14 23:39:08,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901750.0, ans=0.0 2024-08-14 23:39:13,187 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 350, loss[loss=0.09633, beats_loss=0.01114, ecapa_loss=0.0002096, whisper_loss=0.0831, over 16946.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001538, whisper_loss=0.09005, over 3174681.14 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:39:18,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-08-14 23:39:24,809 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-14 23:39:30,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.17 vs. limit=10.0 2024-08-14 23:39:49,207 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 23:40:00,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2902150.0, ans=0.0 2024-08-14 23:40:27,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 400, loss[loss=0.08966, beats_loss=0.009919, ecapa_loss=0.0001997, whisper_loss=0.07774, over 21123.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.000156, whisper_loss=0.09003, over 3353915.03 frames. ], batch size: 87, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:07,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902550.0, ans=0.1 2024-08-14 23:41:12,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2902650.0, ans=0.125 2024-08-14 23:41:18,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-08-14 23:41:18,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.395e+01 2.699e+01 3.154e+01 2.910e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-14 23:41:38,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2902750.0, ans=0.2 2024-08-14 23:41:41,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2902750.0, ans=0.0 2024-08-14 23:41:44,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2902850.0, ans=0.05 2024-08-14 23:41:45,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 450, loss[loss=0.1136, beats_loss=0.009981, ecapa_loss=0.000141, whisper_loss=0.1022, over 22732.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001556, whisper_loss=0.09081, over 3476259.37 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:41:45,840 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-14 23:41:46,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2902850.0, ans=0.0 2024-08-14 23:41:47,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2902850.0, ans=0.125 2024-08-14 23:42:07,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2902950.0, ans=0.0 2024-08-14 23:42:54,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2903250.0, ans=0.125 2024-08-14 23:42:56,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2903250.0, ans=0.1 2024-08-14 23:43:00,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2903250.0, ans=0.125 2024-08-14 23:43:02,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2903250.0, ans=0.125 2024-08-14 23:43:04,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 500, loss[loss=0.1231, beats_loss=0.008721, ecapa_loss=0.0001907, whisper_loss=0.1125, over 23680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001528, whisper_loss=0.09016, over 3566479.43 frames. ], batch size: 93, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:43:17,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-14 23:43:18,994 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-14 23:43:46,012 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-14 23:43:47,232 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-14 23:43:48,984 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-14 23:43:55,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.351e+01 2.577e+01 2.922e+01 3.343e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-14 23:43:55,874 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-14 23:43:58,555 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-14 23:43:59,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2903650.0, ans=0.1 2024-08-14 23:44:08,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-14 23:44:16,258 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-14 23:44:18,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 550, loss[loss=0.1013, beats_loss=0.01161, ecapa_loss=0.0001491, whisper_loss=0.08819, over 15156.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.000153, whisper_loss=0.08934, over 3632081.97 frames. ], batch size: 60, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:44:32,578 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-14 23:44:40,543 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-14 23:45:01,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2904150.0, ans=0.2 2024-08-14 23:45:02,946 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-14 23:45:06,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2904150.0, ans=0.125 2024-08-14 23:45:06,920 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-14 23:45:12,028 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 18 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 23:45:22,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2904250.0, ans=0.125 2024-08-14 23:45:22,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2904250.0, ans=0.0 2024-08-14 23:45:24,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-08-14 23:45:24,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 600, loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.00014, whisper_loss=0.08866, over 14277.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001525, whisper_loss=0.0893, over 3645754.48 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:45:28,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2904350.0, ans=0.125 2024-08-14 23:45:29,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2904350.0, ans=0.1 2024-08-14 23:45:32,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-14 23:45:35,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2024-08-14 23:45:40,340 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 23:45:44,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2904450.0, ans=0.0 2024-08-14 23:45:50,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2904550.0, ans=0.04949747468305833 2024-08-14 23:45:52,572 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-14 23:46:08,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.318e+01 2.599e+01 2.895e+01 9.632e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-14 23:46:12,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2904650.0, ans=0.0 2024-08-14 23:46:21,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2904750.0, ans=0.0 2024-08-14 23:46:26,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-08-14 23:46:29,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 650, loss[loss=0.09982, beats_loss=0.01073, ecapa_loss=0.0001669, whisper_loss=0.08742, over 16877.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001521, whisper_loss=0.08933, over 3686205.21 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:46:30,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2904850.0, ans=0.125 2024-08-14 23:46:33,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2904850.0, ans=0.0 2024-08-14 23:46:54,213 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-14 23:47:18,983 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-14 23:47:25,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2905250.0, ans=0.0 2024-08-14 23:47:36,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 700, loss[loss=0.1261, beats_loss=0.007935, ecapa_loss=0.0001217, whisper_loss=0.117, over 17426.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001521, whisper_loss=0.08962, over 3742901.64 frames. ], batch size: 62, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:48:21,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.342e+01 2.517e+01 2.889e+01 6.845e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-14 23:48:22,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2905650.0, ans=0.0 2024-08-14 23:48:24,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2905650.0, ans=0.125 2024-08-14 23:48:41,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 750, loss[loss=0.1045, beats_loss=0.009619, ecapa_loss=0.000133, whisper_loss=0.09355, over 18711.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.08979, over 3758938.81 frames. ], batch size: 72, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:48:42,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-14 23:49:02,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2905950.0, ans=0.0 2024-08-14 23:49:13,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2906050.0, ans=0.0 2024-08-14 23:49:14,676 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-14 23:49:19,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-08-14 23:49:39,988 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 17 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-14 23:49:47,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 800, loss[loss=0.09533, beats_loss=0.01273, ecapa_loss=0.0001751, whisper_loss=0.08085, over 22083.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001501, whisper_loss=0.09015, over 3798468.25 frames. ], batch size: 91, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:49:54,114 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-14 23:50:01,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2024-08-14 23:50:05,845 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-14 23:50:13,607 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-14 23:50:13,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2906550.0, ans=0.125 2024-08-14 23:50:24,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2906550.0, ans=0.125 2024-08-14 23:50:27,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2906650.0, ans=0.125 2024-08-14 23:50:31,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.222e+01 2.469e+01 2.749e+01 4.131e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-14 23:50:38,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2906750.0, ans=0.2 2024-08-14 23:50:43,717 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-14 23:50:48,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-14 23:50:51,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2906850.0, ans=0.125 2024-08-14 23:50:52,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 850, loss[loss=0.1058, beats_loss=0.009623, ecapa_loss=0.0001389, whisper_loss=0.09482, over 23777.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001504, whisper_loss=0.08942, over 3793992.03 frames. ], batch size: 92, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:51:38,854 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-14 23:51:54,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=15.0 2024-08-14 23:51:58,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 900, loss[loss=0.08713, beats_loss=0.01104, ecapa_loss=0.0001163, whisper_loss=0.07492, over 13998.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001502, whisper_loss=0.08977, over 3811049.85 frames. ], batch size: 54, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:52:07,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2907350.0, ans=0.05 2024-08-14 23:52:23,962 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-14 23:52:24,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907550.0, ans=0.1 2024-08-14 23:52:38,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2907650.0, ans=0.125 2024-08-14 23:52:43,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.513e+01 2.774e+01 3.197e+01 6.969e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-14 23:52:43,871 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-14 23:52:50,388 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 7 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-14 23:52:59,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2907750.0, ans=0.2 2024-08-14 23:53:01,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-14 23:53:05,096 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 950, loss[loss=0.11, beats_loss=0.01099, ecapa_loss=0.0001673, whisper_loss=0.0973, over 16002.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001496, whisper_loss=0.08964, over 3801183.33 frames. ], batch size: 64, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:53:17,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2907950.0, ans=0.05 2024-08-14 23:53:22,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2907950.0, ans=0.125 2024-08-14 23:53:26,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2907950.0, ans=0.0 2024-08-14 23:53:29,804 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-14 23:53:33,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2908050.0, ans=0.125 2024-08-14 23:53:39,272 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:54:04,182 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-14 23:54:08,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2908250.0, ans=0.1 2024-08-14 23:54:10,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1000, loss[loss=0.1145, beats_loss=0.00954, ecapa_loss=0.000153, whisper_loss=0.1034, over 22555.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001491, whisper_loss=0.08949, over 3822347.11 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:54:18,707 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-14 23:54:19,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2908350.0, ans=0.1 2024-08-14 23:54:35,868 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-14 23:54:37,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.58 vs. limit=15.0 2024-08-14 23:54:55,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.358e+01 2.563e+01 2.906e+01 4.216e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-14 23:54:55,237 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-14 23:54:57,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908650.0, ans=0.1 2024-08-14 23:55:15,887 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1050, loss[loss=0.1029, beats_loss=0.009736, ecapa_loss=0.0001593, whisper_loss=0.09154, over 18832.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.000149, whisper_loss=0.08892, over 3807226.09 frames. ], batch size: 74, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:55:16,049 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-14 23:55:20,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-14 23:55:25,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-14 23:55:32,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2908950.0, ans=0.2 2024-08-14 23:55:34,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2908950.0, ans=0.1 2024-08-14 23:55:47,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2909050.0, ans=0.0 2024-08-14 23:55:50,030 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-14 23:55:50,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=2909050.0, ans=0.2 2024-08-14 23:55:57,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2909150.0, ans=0.125 2024-08-14 23:56:09,809 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-14 23:56:15,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2909250.0, ans=0.0 2024-08-14 23:56:21,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1100, loss[loss=0.107, beats_loss=0.01014, ecapa_loss=0.0001321, whisper_loss=0.0955, over 20469.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001495, whisper_loss=0.0895, over 3807957.90 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:56:21,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2909350.0, ans=0.0 2024-08-14 23:56:25,317 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-14 23:56:26,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.81 vs. limit=5.0 2024-08-14 23:56:26,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2909350.0, ans=0.1 2024-08-14 23:56:38,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2909450.0, ans=0.0 2024-08-14 23:56:42,331 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-14 23:56:57,868 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-14 23:57:03,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2909650.0, ans=0.04949747468305833 2024-08-14 23:57:05,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.324e+01 2.535e+01 2.853e+01 4.555e+01, threshold=5.069e+01, percent-clipped=0.0 2024-08-14 23:57:08,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2909650.0, ans=0.0 2024-08-14 23:57:15,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2909750.0, ans=0.125 2024-08-14 23:57:20,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2909750.0, ans=0.125 2024-08-14 23:57:26,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1150, loss[loss=0.1195, beats_loss=0.007741, ecapa_loss=0.0001776, whisper_loss=0.1099, over 20982.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001496, whisper_loss=0.09005, over 3811892.34 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:58:18,794 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08089441806077957, model_norm_threshold=50.69233322143555 2024-08-14 23:58:18,983 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.506e+05, grad_sumsq=1.506e+05, orig_rms_sq=1.000e+00 2024-08-14 23:58:22,058 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-14 23:58:23,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-14 23:58:32,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1200, loss[loss=0.08791, beats_loss=0.01133, ecapa_loss=0.0001652, whisper_loss=0.07494, over 16419.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.0001499, whisper_loss=0.08923, over 3794388.16 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:58:32,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2910350.0, ans=0.0 2024-08-14 23:58:35,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2910350.0, ans=0.125 2024-08-14 23:58:38,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2910350.0, ans=0.125 2024-08-14 23:58:46,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:51,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2910450.0, ans=0.2 2024-08-14 23:58:54,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2910450.0, ans=0.125 2024-08-14 23:58:56,686 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 16 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-14 23:59:01,871 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-14 23:59:18,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.273e+01 2.471e+01 2.921e+01 6.266e+02, threshold=4.943e+01, percent-clipped=3.0 2024-08-14 23:59:21,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910650.0, ans=0.1 2024-08-14 23:59:24,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2910650.0, ans=0.125 2024-08-14 23:59:29,054 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-14 23:59:39,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1250, loss[loss=0.1068, beats_loss=0.01248, ecapa_loss=0.0001294, whisper_loss=0.09306, over 23061.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001489, whisper_loss=0.08927, over 3815034.55 frames. ], batch size: 90, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-14 23:59:50,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2910850.0, ans=0.125 2024-08-14 23:59:57,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2910950.0, ans=0.5 2024-08-15 00:00:05,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2910950.0, ans=0.0 2024-08-15 00:00:18,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.51 vs. limit=22.5 2024-08-15 00:00:19,264 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 00:00:26,470 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 00:00:32,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2911150.0, ans=0.0 2024-08-15 00:00:35,360 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 00:00:51,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1300, loss[loss=0.09384, beats_loss=0.01367, ecapa_loss=0.0001577, whisper_loss=0.0786, over 19038.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001486, whisper_loss=0.08971, over 3834846.96 frames. ], batch size: 78, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:00:56,242 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 00:00:58,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911350.0, ans=0.1 2024-08-15 00:00:59,352 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 00:01:38,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2911650.0, ans=0.125 2024-08-15 00:01:40,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.220e+01 2.514e+01 2.820e+01 5.708e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-15 00:01:44,606 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 00:01:46,205 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 00:01:47,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2911650.0, ans=0.07 2024-08-15 00:01:52,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2911750.0, ans=0.2 2024-08-15 00:01:53,928 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 00:02:06,026 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 00:02:07,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1350, loss[loss=0.09913, beats_loss=0.009413, ecapa_loss=0.000142, whisper_loss=0.08829, over 18531.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.000149, whisper_loss=0.08984, over 3852820.88 frames. ], batch size: 69, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:02:20,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2911850.0, ans=15.0 2024-08-15 00:02:40,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2912050.0, ans=0.125 2024-08-15 00:02:41,118 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-15 00:02:41,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2912050.0, ans=0.125 2024-08-15 00:02:46,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2912050.0, ans=0.125 2024-08-15 00:03:01,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2912150.0, ans=0.2 2024-08-15 00:03:09,282 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 00:03:12,525 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 00:03:14,473 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 00:03:23,211 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 00:03:26,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1400, loss[loss=0.1018, beats_loss=0.01099, ecapa_loss=0.0001582, whisper_loss=0.08926, over 19382.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001486, whisper_loss=0.09025, over 3829338.34 frames. ], batch size: 75, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:03:36,128 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 00:03:36,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2912350.0, ans=0.125 2024-08-15 00:03:42,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2912450.0, ans=0.125 2024-08-15 00:03:43,292 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 00:03:44,532 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 00:03:52,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2912450.0, ans=0.0 2024-08-15 00:04:18,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.213e+01 2.409e+01 2.857e+01 4.258e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-15 00:04:30,470 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 00:04:34,575 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 00:04:41,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2912750.0, ans=0.0 2024-08-15 00:05:00,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1450, loss[loss=0.07982, beats_loss=0.01145, ecapa_loss=0.0001298, whisper_loss=0.06707, over 14366.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.000148, whisper_loss=0.08932, over 3796399.16 frames. ], batch size: 56, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:05:01,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2912850.0, ans=0.1 2024-08-15 00:05:12,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2912850.0, ans=0.125 2024-08-15 00:05:13,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2912850.0, ans=0.0 2024-08-15 00:05:14,751 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 00:05:20,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2912950.0, ans=0.0 2024-08-15 00:05:23,500 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 00:05:28,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2024-08-15 00:05:45,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2913050.0, ans=0.07 2024-08-15 00:05:56,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.88 vs. limit=22.5 2024-08-15 00:06:06,368 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 00:06:11,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2913250.0, ans=0.125 2024-08-15 00:06:22,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2913250.0, ans=0.2 2024-08-15 00:06:25,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1500, loss[loss=0.1236, beats_loss=0.00757, ecapa_loss=0.000167, whisper_loss=0.1144, over 15804.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.08918, over 3810008.88 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:06:26,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2913350.0, ans=0.1 2024-08-15 00:06:28,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2913350.0, ans=22.5 2024-08-15 00:06:45,657 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 00:06:59,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2913550.0, ans=0.125 2024-08-15 00:07:07,586 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 00:07:19,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2913650.0, ans=0.125 2024-08-15 00:07:20,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.340e+01 2.607e+01 2.901e+01 2.472e+02, threshold=5.215e+01, percent-clipped=2.0 2024-08-15 00:07:35,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2913750.0, ans=0.2 2024-08-15 00:07:38,359 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 00:07:39,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2913750.0, ans=0.125 2024-08-15 00:07:54,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1550, loss[loss=0.09498, beats_loss=0.01195, ecapa_loss=0.0001497, whisper_loss=0.08154, over 21671.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.08971, over 3832007.45 frames. ], batch size: 88, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:08:03,999 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 00:08:07,895 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-15 00:08:44,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2914050.0, ans=0.125 2024-08-15 00:08:54,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2914050.0, ans=0.125 2024-08-15 00:08:54,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914050.0, ans=0.1 2024-08-15 00:09:01,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2914150.0, ans=15.0 2024-08-15 00:09:06,621 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 00:09:13,273 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 00:09:36,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1600, loss[loss=0.1072, beats_loss=0.01168, ecapa_loss=0.0001291, whisper_loss=0.09419, over 21729.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01065, ecapa_loss=0.0001462, whisper_loss=0.08902, over 3819090.53 frames. ], batch size: 84, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:09:42,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2914350.0, ans=0.125 2024-08-15 00:09:42,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2914350.0, ans=0.125 2024-08-15 00:09:43,681 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 00:09:53,061 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 00:10:09,835 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 00:10:14,491 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 00:10:16,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2914450.0, ans=0.0 2024-08-15 00:10:26,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2914550.0, ans=0.125 2024-08-15 00:10:49,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2914650.0, ans=0.125 2024-08-15 00:10:57,278 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.368e+01 2.559e+01 2.879e+01 3.704e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 00:11:36,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1650, loss[loss=0.09472, beats_loss=0.01349, ecapa_loss=0.0001396, whisper_loss=0.07983, over 23224.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01061, ecapa_loss=0.0001463, whisper_loss=0.0896, over 3839533.06 frames. ], batch size: 95, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:12:10,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2914950.0, ans=0.125 2024-08-15 00:12:14,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2914950.0, ans=0.125 2024-08-15 00:12:25,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2915050.0, ans=0.125 2024-08-15 00:12:47,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2915150.0, ans=0.2 2024-08-15 00:13:00,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=12.0 2024-08-15 00:13:01,699 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 00:13:36,039 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1700, loss[loss=0.09256, beats_loss=0.009784, ecapa_loss=0.0001823, whisper_loss=0.08095, over 18751.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001477, whisper_loss=0.08973, over 3837936.29 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:13:38,775 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 00:13:55,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-15 00:14:07,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2915450.0, ans=0.125 2024-08-15 00:14:28,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-08-15 00:14:36,014 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 00:14:55,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.594e+01 2.900e+01 5.252e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-15 00:14:56,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-08-15 00:15:04,317 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 00:15:07,396 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 00:15:17,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2024-08-15 00:15:30,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1750, loss[loss=0.1131, beats_loss=0.009248, ecapa_loss=0.0001461, whisper_loss=0.1024, over 23548.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001477, whisper_loss=0.09042, over 3829254.35 frames. ], batch size: 89, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:15:36,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2915850.0, ans=0.125 2024-08-15 00:15:43,157 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 00:15:54,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2915950.0, ans=0.09899494936611666 2024-08-15 00:15:56,856 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 00:15:57,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2915950.0, ans=0.125 2024-08-15 00:15:57,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2915950.0, ans=0.2 2024-08-15 00:15:57,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2915950.0, ans=0.125 2024-08-15 00:16:09,700 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 00:16:17,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-15 00:16:17,678 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 00:16:27,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2916150.0, ans=0.125 2024-08-15 00:16:28,553 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 00:16:31,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2916250.0, ans=0.125 2024-08-15 00:16:31,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2916250.0, ans=0.125 2024-08-15 00:16:40,278 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 00:16:42,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1800, loss[loss=0.09856, beats_loss=0.01218, ecapa_loss=0.0001641, whisper_loss=0.08474, over 16376.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001479, whisper_loss=0.09041, over 3793177.07 frames. ], batch size: 67, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:16:54,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-15 00:17:16,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2916550.0, ans=0.0 2024-08-15 00:17:27,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2916650.0, ans=0.0 2024-08-15 00:17:29,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.322e+01 2.601e+01 3.039e+01 2.068e+02, threshold=5.202e+01, percent-clipped=5.0 2024-08-15 00:17:32,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2916650.0, ans=0.2 2024-08-15 00:17:35,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2916650.0, ans=0.2 2024-08-15 00:17:38,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2916750.0, ans=0.0 2024-08-15 00:17:42,650 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 00:17:50,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2916750.0, ans=0.125 2024-08-15 00:17:52,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1850, loss[loss=0.09306, beats_loss=0.01042, ecapa_loss=0.0001579, whisper_loss=0.08106, over 15152.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001478, whisper_loss=0.09008, over 3771622.66 frames. ], batch size: 61, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:17:59,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2916850.0, ans=0.1 2024-08-15 00:17:59,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2916850.0, ans=0.125 2024-08-15 00:18:40,627 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 00:18:50,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2917150.0, ans=0.125 2024-08-15 00:18:56,263 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 00:19:10,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1900, loss[loss=0.06825, beats_loss=0.01118, ecapa_loss=0.000147, whisper_loss=0.0556, over 15949.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001489, whisper_loss=0.08971, over 3775704.14 frames. ], batch size: 64, lr: 3.00e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:19:12,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2917350.0, ans=0.95 2024-08-15 00:19:21,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-08-15 00:19:35,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2917450.0, ans=0.0 2024-08-15 00:19:44,337 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 00:20:06,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.388e+01 2.711e+01 3.006e+01 3.511e+02, threshold=5.422e+01, percent-clipped=5.0 2024-08-15 00:20:22,910 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 00:20:30,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 1950, loss[loss=0.1047, beats_loss=0.01044, ecapa_loss=0.0001184, whisper_loss=0.09307, over 16692.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001497, whisper_loss=0.08991, over 3788627.61 frames. ], batch size: 59, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:20:52,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-15 00:21:35,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2918250.0, ans=0.2 2024-08-15 00:21:35,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2918250.0, ans=0.1 2024-08-15 00:21:42,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=22.5 2024-08-15 00:21:43,851 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.045e-02 2024-08-15 00:21:43,890 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.281e+00 2024-08-15 00:21:51,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2918350.0, ans=0.125 2024-08-15 00:21:52,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2000, loss[loss=0.1067, beats_loss=0.01052, ecapa_loss=0.0001236, whisper_loss=0.09491, over 17563.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001492, whisper_loss=0.0904, over 3813851.11 frames. ], batch size: 68, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:22:17,225 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 00:22:20,101 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:22:29,900 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 00:22:31,679 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 00:22:37,728 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:22:49,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.256e+01 2.494e+01 2.913e+01 5.528e+01, threshold=4.988e+01, percent-clipped=1.0 2024-08-15 00:22:52,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2918650.0, ans=0.0 2024-08-15 00:23:00,021 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 00:23:15,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2050, loss[loss=0.1094, beats_loss=0.008584, ecapa_loss=0.0001609, whisper_loss=0.09922, over 20632.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001484, whisper_loss=0.08943, over 3807952.31 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:23:19,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2918850.0, ans=0.0 2024-08-15 00:23:23,043 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 00:23:25,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2918850.0, ans=0.125 2024-08-15 00:23:40,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2918950.0, ans=0.0 2024-08-15 00:23:50,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2919050.0, ans=0.95 2024-08-15 00:23:56,051 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 00:23:58,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2919050.0, ans=0.125 2024-08-15 00:23:58,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2024-08-15 00:24:05,750 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 00:24:10,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-15 00:24:15,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=12.0 2024-08-15 00:24:17,855 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-15 00:24:21,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2024-08-15 00:24:29,385 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 00:24:32,683 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 00:24:36,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2100, loss[loss=0.1207, beats_loss=0.01039, ecapa_loss=0.0001303, whisper_loss=0.109, over 17271.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001473, whisper_loss=0.08956, over 3811708.10 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:24:37,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2919350.0, ans=0.0 2024-08-15 00:24:41,232 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 00:24:58,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-08-15 00:25:00,973 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 00:25:09,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-15 00:25:12,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=22.5 2024-08-15 00:25:21,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-15 00:25:31,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.305e+01 2.496e+01 2.863e+01 3.632e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 00:25:57,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2150, loss[loss=0.1013, beats_loss=0.01257, ecapa_loss=0.000124, whisper_loss=0.08748, over 20915.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01075, ecapa_loss=0.0001454, whisper_loss=0.08903, over 3813168.56 frames. ], batch size: 82, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:26:08,317 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 00:26:09,681 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 00:26:33,935 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 00:26:43,541 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 00:26:57,219 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 00:26:59,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2920150.0, ans=0.125 2024-08-15 00:27:21,797 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 00:27:26,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2200, loss[loss=0.1124, beats_loss=0.009821, ecapa_loss=0.0001356, whisper_loss=0.1013, over 18482.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.000146, whisper_loss=0.08987, over 3777399.31 frames. ], batch size: 70, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:27:38,178 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 00:27:42,637 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 00:27:54,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2920450.0, ans=0.125 2024-08-15 00:28:20,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2920650.0, ans=0.125 2024-08-15 00:28:22,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.682e+01 3.041e+01 4.507e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-15 00:28:29,289 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 00:28:42,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2920750.0, ans=0.1 2024-08-15 00:28:45,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2920750.0, ans=0.07 2024-08-15 00:28:49,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2250, loss[loss=0.08931, beats_loss=0.0132, ecapa_loss=0.0001638, whisper_loss=0.07447, over 13198.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01075, ecapa_loss=0.0001464, whisper_loss=0.08981, over 3761546.73 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:28:59,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2920850.0, ans=0.2 2024-08-15 00:29:08,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-15 00:29:10,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-15 00:29:24,088 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 32 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 00:29:24,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2921050.0, ans=0.0 2024-08-15 00:29:46,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2921150.0, ans=0.125 2024-08-15 00:29:59,418 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 00:30:08,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2921250.0, ans=0.1 2024-08-15 00:30:10,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2921350.0, ans=0.125 2024-08-15 00:30:10,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-15 00:30:11,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2300, loss[loss=0.106, beats_loss=0.01035, ecapa_loss=0.0001332, whisper_loss=0.09434, over 21288.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.000148, whisper_loss=0.09133, over 3820245.16 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 00:30:15,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2921350.0, ans=0.1 2024-08-15 00:30:28,889 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 00:30:37,314 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 00:30:37,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2921450.0, ans=0.035 2024-08-15 00:30:43,360 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 00:30:47,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=2921550.0, ans=0.2 2024-08-15 00:30:50,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2921550.0, ans=0.125 2024-08-15 00:30:56,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2921650.0, ans=0.125 2024-08-15 00:31:04,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.272e+01 2.490e+01 2.826e+01 4.749e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-15 00:31:11,175 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 00:31:27,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-15 00:31:28,946 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 00:31:30,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2921750.0, ans=0.5 2024-08-15 00:31:32,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2350, loss[loss=0.1103, beats_loss=0.01048, ecapa_loss=0.0001431, whisper_loss=0.09838, over 18662.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09145, over 3826082.76 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:31:37,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2921850.0, ans=0.1 2024-08-15 00:31:56,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2921950.0, ans=0.0 2024-08-15 00:32:06,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2922050.0, ans=0.2 2024-08-15 00:32:07,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2922050.0, ans=0.2 2024-08-15 00:32:10,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922050.0, ans=0.1 2024-08-15 00:32:11,094 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 00:32:29,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2922150.0, ans=0.1 2024-08-15 00:32:47,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2922250.0, ans=0.07 2024-08-15 00:32:47,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-15 00:32:52,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2922350.0, ans=0.0 2024-08-15 00:32:53,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2400, loss[loss=0.11, beats_loss=0.0108, ecapa_loss=0.0001666, whisper_loss=0.09754, over 18351.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001506, whisper_loss=0.09067, over 3833233.83 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:33:01,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2922350.0, ans=0.0 2024-08-15 00:33:12,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2922450.0, ans=0.1 2024-08-15 00:33:17,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2922450.0, ans=0.05 2024-08-15 00:33:32,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2922550.0, ans=0.125 2024-08-15 00:33:46,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2922650.0, ans=0.125 2024-08-15 00:33:50,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.275e+01 2.495e+01 2.898e+01 2.121e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-15 00:34:15,615 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2450, loss[loss=0.09626, beats_loss=0.00987, ecapa_loss=0.000148, whisper_loss=0.08491, over 15114.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.09055, over 3850786.51 frames. ], batch size: 57, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:34:21,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2922850.0, ans=0.1 2024-08-15 00:34:26,631 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.640e-03 2024-08-15 00:34:35,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2922950.0, ans=0.125 2024-08-15 00:34:37,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2922950.0, ans=0.2 2024-08-15 00:34:39,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2922950.0, ans=0.125 2024-08-15 00:34:41,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-08-15 00:34:44,731 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 00:34:49,995 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 00:34:51,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2923050.0, ans=0.125 2024-08-15 00:35:17,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2923150.0, ans=0.0 2024-08-15 00:35:28,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2923250.0, ans=0.0 2024-08-15 00:35:38,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2500, loss[loss=0.1005, beats_loss=0.00907, ecapa_loss=0.0001731, whisper_loss=0.08971, over 21758.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001518, whisper_loss=0.09088, over 3885227.08 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:35:40,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2923350.0, ans=0.125 2024-08-15 00:35:44,942 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 00:35:54,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2923450.0, ans=0.035 2024-08-15 00:36:07,907 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-15 00:36:15,322 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 00:36:25,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2923650.0, ans=0.1 2024-08-15 00:36:32,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.328e+01 2.584e+01 2.965e+01 7.495e+01, threshold=5.168e+01, percent-clipped=2.0 2024-08-15 00:36:39,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2923650.0, ans=0.125 2024-08-15 00:36:42,178 WARNING [optim.py:496] (3/4) Scaling gradients by 0.026615602895617485, model_norm_threshold=51.67815017700195 2024-08-15 00:36:42,343 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.809e+05, grad_sumsq=6.809e+05, orig_rms_sq=1.000e+00 2024-08-15 00:36:44,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2923750.0, ans=0.0 2024-08-15 00:36:58,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2923850.0, ans=0.125 2024-08-15 00:36:59,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2550, loss[loss=0.1152, beats_loss=0.009953, ecapa_loss=0.0001581, whisper_loss=0.1037, over 18361.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001523, whisper_loss=0.09115, over 3876680.44 frames. ], batch size: 76, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:37:00,894 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-15 00:37:06,544 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-15 00:37:11,973 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 00:37:21,922 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 00:37:24,520 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 00:38:16,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2924350.0, ans=0.125 2024-08-15 00:38:16,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2600, loss[loss=0.1059, beats_loss=0.00922, ecapa_loss=0.0001528, whisper_loss=0.09518, over 20960.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.000152, whisper_loss=0.09059, over 3860206.00 frames. ], batch size: 84, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:38:18,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2924350.0, ans=15.0 2024-08-15 00:38:18,666 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 00:38:29,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2924350.0, ans=0.125 2024-08-15 00:38:35,331 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-15 00:38:44,249 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-15 00:38:53,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2924550.0, ans=0.0 2024-08-15 00:39:01,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2924650.0, ans=0.125 2024-08-15 00:39:07,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2924650.0, ans=0.125 2024-08-15 00:39:08,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.339e+01 2.635e+01 2.939e+01 1.942e+03, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 00:39:12,313 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 00:39:18,227 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 00:39:23,971 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 00:39:24,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2924750.0, ans=0.07 2024-08-15 00:39:30,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-15 00:39:32,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2650, loss[loss=0.09843, beats_loss=0.008435, ecapa_loss=0.0001804, whisper_loss=0.08819, over 21819.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001527, whisper_loss=0.09042, over 3898622.86 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:39:41,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.75 vs. limit=22.5 2024-08-15 00:39:58,655 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 00:40:00,001 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 00:40:00,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2924950.0, ans=0.0 2024-08-15 00:40:03,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2925050.0, ans=0.125 2024-08-15 00:40:24,525 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 17 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 00:40:50,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2700, loss[loss=0.1036, beats_loss=0.00889, ecapa_loss=0.0001454, whisper_loss=0.09322, over 23300.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001525, whisper_loss=0.08956, over 3886923.91 frames. ], batch size: 91, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:41:14,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2925450.0, ans=0.125 2024-08-15 00:41:14,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=10.0 2024-08-15 00:41:21,841 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 00:41:25,213 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 11 from Vox, 50 fro AS 2024-08-15 00:41:35,943 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 00:41:37,144 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 00:41:43,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.279e+01 2.490e+01 2.726e+01 4.419e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-15 00:42:09,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2750, loss[loss=0.1175, beats_loss=0.01073, ecapa_loss=0.0001422, whisper_loss=0.1054, over 20795.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001514, whisper_loss=0.08988, over 3884934.68 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:42:13,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-08-15 00:42:16,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2925850.0, ans=0.125 2024-08-15 00:42:16,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=12.0 2024-08-15 00:42:19,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-15 00:42:34,679 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 00:42:47,020 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 00:42:47,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-08-15 00:42:47,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=15.0 2024-08-15 00:42:51,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2926050.0, ans=0.125 2024-08-15 00:43:07,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2024-08-15 00:43:08,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-15 00:43:15,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2926250.0, ans=0.125 2024-08-15 00:43:15,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2024-08-15 00:43:16,304 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 00:43:27,349 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 00:43:31,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2926350.0, ans=0.0 2024-08-15 00:43:32,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2800, loss[loss=0.09561, beats_loss=0.01209, ecapa_loss=0.000137, whisper_loss=0.08216, over 21612.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001506, whisper_loss=0.09044, over 3927991.50 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:43:44,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2926350.0, ans=0.125 2024-08-15 00:43:50,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2926450.0, ans=0.125 2024-08-15 00:43:55,621 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 00:44:20,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-15 00:44:31,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.375e+01 2.624e+01 2.895e+01 7.200e+01, threshold=5.247e+01, percent-clipped=1.0 2024-08-15 00:44:32,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2926650.0, ans=0.125 2024-08-15 00:44:39,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2926650.0, ans=0.125 2024-08-15 00:44:40,824 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.651e-02 2024-08-15 00:45:00,074 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2850, loss[loss=0.1017, beats_loss=0.01204, ecapa_loss=0.0001374, whisper_loss=0.08832, over 23207.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001499, whisper_loss=0.0904, over 3930898.67 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:45:04,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2926850.0, ans=0.0 2024-08-15 00:45:21,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2926950.0, ans=0.0 2024-08-15 00:45:26,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=12.0 2024-08-15 00:45:34,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2927050.0, ans=0.2 2024-08-15 00:45:35,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-15 00:46:03,921 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 10 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 00:46:10,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2927250.0, ans=0.125 2024-08-15 00:46:18,022 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 00:46:24,450 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2900, loss[loss=0.09388, beats_loss=0.01131, ecapa_loss=0.0001565, whisper_loss=0.08101, over 18686.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01068, ecapa_loss=0.00015, whisper_loss=0.09, over 3910435.60 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:46:33,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2927350.0, ans=0.125 2024-08-15 00:46:45,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-15 00:46:52,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2927450.0, ans=0.0 2024-08-15 00:46:59,717 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 00:47:24,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.340e+01 2.621e+01 2.930e+01 9.659e+01, threshold=5.242e+01, percent-clipped=1.0 2024-08-15 00:47:28,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-15 00:47:37,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-15 00:47:43,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2927750.0, ans=0.125 2024-08-15 00:47:44,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2927750.0, ans=0.2 2024-08-15 00:47:48,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2927750.0, ans=0.0 2024-08-15 00:47:52,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 2950, loss[loss=0.09446, beats_loss=0.01232, ecapa_loss=0.0001359, whisper_loss=0.08078, over 22573.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01073, ecapa_loss=0.0001504, whisper_loss=0.08974, over 3894075.95 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:48:14,823 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 00:48:24,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-08-15 00:49:01,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2928150.0, ans=0.125 2024-08-15 00:49:01,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2928150.0, ans=0.125 2024-08-15 00:49:05,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2928250.0, ans=0.125 2024-08-15 00:49:11,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2928250.0, ans=0.125 2024-08-15 00:49:22,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3000, loss[loss=0.109, beats_loss=0.009966, ecapa_loss=0.0001349, whisper_loss=0.09765, over 14204.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001515, whisper_loss=0.09014, over 3902125.46 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:49:22,676 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 00:50:02,590 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005339, whisper_loss=0.248, over 922467.00 frames. 2024-08-15 00:50:21,784 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on SV_voxceleb1: loss=0.004208, beats_loss=0, ecapa_loss=0.0004208, whisper_loss=0, over 939242.00 frames. 2024-08-15 00:52:15,993 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 00:52:15,997 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 00:52:20,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-15 00:52:20,754 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 00:52:24,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2928350.0, ans=0.125 2024-08-15 00:52:41,673 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 00:52:57,865 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 00:53:08,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.494e+01 2.724e+01 3.053e+01 2.712e+02, threshold=5.448e+01, percent-clipped=2.0 2024-08-15 00:53:14,594 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 00:53:33,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2928750.0, ans=0.125 2024-08-15 00:53:37,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3050, loss[loss=0.1078, beats_loss=0.01388, ecapa_loss=0.0001276, whisper_loss=0.09267, over 22879.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001514, whisper_loss=0.0906, over 3883432.99 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:53:48,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2928850.0, ans=0.125 2024-08-15 00:53:56,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2928950.0, ans=0.0 2024-08-15 00:54:19,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2929050.0, ans=0.0 2024-08-15 00:54:45,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=12.0 2024-08-15 00:54:49,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2929250.0, ans=0.1 2024-08-15 00:55:06,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3100, loss[loss=0.09102, beats_loss=0.0102, ecapa_loss=0.0001988, whisper_loss=0.07883, over 19659.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01078, ecapa_loss=0.0001517, whisper_loss=0.09029, over 3912040.57 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:55:16,954 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 00:55:27,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2929450.0, ans=0.125 2024-08-15 00:55:46,656 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 00:55:50,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2929550.0, ans=0.1 2024-08-15 00:55:51,310 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 00:55:51,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2929550.0, ans=0.07 2024-08-15 00:55:51,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2929550.0, ans=0.125 2024-08-15 00:55:56,202 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 00:56:05,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.234e+01 2.421e+01 2.829e+01 3.932e+01, threshold=4.842e+01, percent-clipped=0.0 2024-08-15 00:56:11,987 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 00:56:31,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2929750.0, ans=0.125 2024-08-15 00:56:31,825 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 00:56:34,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3150, loss[loss=0.1003, beats_loss=0.008823, ecapa_loss=0.0002306, whisper_loss=0.08921, over 15694.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001523, whisper_loss=0.09056, over 3870415.78 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:56:35,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2929850.0, ans=0.0 2024-08-15 00:57:03,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2929950.0, ans=0.1 2024-08-15 00:57:05,685 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 00:57:09,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2930050.0, ans=0.125 2024-08-15 00:57:09,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2930050.0, ans=0.125 2024-08-15 00:57:15,958 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 00:57:58,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3200, loss[loss=0.1087, beats_loss=0.01001, ecapa_loss=0.0001406, whisper_loss=0.09733, over 14234.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01068, ecapa_loss=0.0001529, whisper_loss=0.09108, over 3867451.44 frames. ], batch size: 54, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:58:12,574 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 00:58:41,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2930550.0, ans=0.0 2024-08-15 00:58:50,022 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-15 00:58:52,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2930650.0, ans=0.125 2024-08-15 00:58:58,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.340e+01 2.607e+01 2.888e+01 4.627e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 00:59:10,516 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07108230143785477, model_norm_threshold=52.141944885253906 2024-08-15 00:59:10,731 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.209e+04, grad_sumsq=9.209e+04, orig_rms_sq=1.000e+00 2024-08-15 00:59:22,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=2930750.0, ans=15.0 2024-08-15 00:59:24,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2930750.0, ans=0.0 2024-08-15 00:59:26,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3250, loss[loss=0.07807, beats_loss=0.01318, ecapa_loss=0.0001185, whisper_loss=0.06371, over 18179.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01071, ecapa_loss=0.0001529, whisper_loss=0.09106, over 3904232.56 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 00:59:31,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2930850.0, ans=0.1 2024-08-15 01:00:05,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2931050.0, ans=0.125 2024-08-15 01:00:05,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2931050.0, ans=0.2 2024-08-15 01:00:16,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2931050.0, ans=0.0 2024-08-15 01:00:53,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3300, loss[loss=0.08871, beats_loss=0.01304, ecapa_loss=0.0001456, whisper_loss=0.07421, over 22610.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001537, whisper_loss=0.09122, over 3906218.67 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:01:22,622 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 01:01:29,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2931550.0, ans=0.1 2024-08-15 01:01:33,463 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 01:01:44,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2931650.0, ans=0.125 2024-08-15 01:01:47,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.317e+01 2.622e+01 2.887e+01 7.335e+02, threshold=5.244e+01, percent-clipped=2.0 2024-08-15 01:01:56,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2931650.0, ans=0.2 2024-08-15 01:01:59,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2931750.0, ans=0.05 2024-08-15 01:02:14,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3350, loss[loss=0.1078, beats_loss=0.01203, ecapa_loss=0.000163, whisper_loss=0.09414, over 22566.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001537, whisper_loss=0.09134, over 3895509.07 frames. ], batch size: 94, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:02:27,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2931850.0, ans=0.125 2024-08-15 01:02:30,898 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 01:02:43,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2931950.0, ans=0.025 2024-08-15 01:03:10,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2932150.0, ans=0.125 2024-08-15 01:03:28,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-15 01:03:34,921 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 01:03:37,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2932250.0, ans=0.1 2024-08-15 01:03:45,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3400, loss[loss=0.08612, beats_loss=0.01236, ecapa_loss=0.000146, whisper_loss=0.0723, over 21558.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001531, whisper_loss=0.09078, over 3885500.78 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:03:54,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2932350.0, ans=0.125 2024-08-15 01:04:20,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2932550.0, ans=0.0 2024-08-15 01:04:20,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2932550.0, ans=0.125 2024-08-15 01:04:28,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2932550.0, ans=0.125 2024-08-15 01:04:37,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2932650.0, ans=0.0 2024-08-15 01:04:44,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.349e+01 2.635e+01 2.975e+01 2.960e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-15 01:04:54,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2932750.0, ans=0.2 2024-08-15 01:04:58,818 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 01:05:00,181 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 01:05:02,318 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 01:05:05,099 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 01:05:06,806 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 01:05:13,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2932850.0, ans=0.2 2024-08-15 01:05:14,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3450, loss[loss=0.1026, beats_loss=0.009936, ecapa_loss=0.0001516, whisper_loss=0.09119, over 18214.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001536, whisper_loss=0.09033, over 3877268.00 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:05:23,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2932850.0, ans=0.09899494936611666 2024-08-15 01:05:29,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2932850.0, ans=0.125 2024-08-15 01:05:35,828 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 01:06:00,585 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 01:06:04,507 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 01:06:22,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2933150.0, ans=0.125 2024-08-15 01:06:32,069 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 01:06:37,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-08-15 01:06:38,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2933250.0, ans=0.015 2024-08-15 01:06:47,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3500, loss[loss=0.1117, beats_loss=0.01127, ecapa_loss=0.0001486, whisper_loss=0.09894, over 22833.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001531, whisper_loss=0.0903, over 3908074.96 frames. ], batch size: 93, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:06:52,441 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0579170286655426, model_norm_threshold=52.703590393066406 2024-08-15 01:06:52,612 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.021e+05, grad_sumsq=2.962e+04, orig_rms_sq=3.448e+00 2024-08-15 01:07:04,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-15 01:07:07,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-15 01:07:10,206 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 01:07:13,662 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-15 01:07:23,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=15.0 2024-08-15 01:07:27,613 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 01:07:36,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2933650.0, ans=0.125 2024-08-15 01:07:44,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.354e+01 2.571e+01 2.919e+01 9.100e+02, threshold=5.142e+01, percent-clipped=1.0 2024-08-15 01:07:55,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2933750.0, ans=0.0 2024-08-15 01:08:10,695 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3550, loss[loss=0.09611, beats_loss=0.013, ecapa_loss=0.0001344, whisper_loss=0.08177, over 21725.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001526, whisper_loss=0.0903, over 3925163.74 frames. ], batch size: 88, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:08:12,856 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 01:08:26,219 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 01:08:38,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2933950.0, ans=0.0 2024-08-15 01:08:47,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2934050.0, ans=0.1 2024-08-15 01:08:52,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2934050.0, ans=0.125 2024-08-15 01:09:04,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2934150.0, ans=0.125 2024-08-15 01:09:33,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3600, loss[loss=0.0786, beats_loss=0.01146, ecapa_loss=0.0001714, whisper_loss=0.06543, over 14655.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001519, whisper_loss=0.09, over 3876029.10 frames. ], batch size: 60, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:09:37,436 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-15 01:09:39,133 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 01:09:48,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2934350.0, ans=0.125 2024-08-15 01:09:55,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2934450.0, ans=0.04949747468305833 2024-08-15 01:10:08,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2934550.0, ans=0.125 2024-08-15 01:10:21,456 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 01:10:22,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-15 01:10:24,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-08-15 01:10:25,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2934650.0, ans=0.0 2024-08-15 01:10:27,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2934650.0, ans=0.125 2024-08-15 01:10:27,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2934650.0, ans=0.125 2024-08-15 01:10:31,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.251e+01 2.514e+01 2.875e+01 6.843e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-15 01:10:34,007 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 36 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 01:10:44,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2934750.0, ans=0.0 2024-08-15 01:10:58,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-15 01:10:58,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3650, loss[loss=0.09739, beats_loss=0.009686, ecapa_loss=0.0001551, whisper_loss=0.08615, over 19656.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001529, whisper_loss=0.0898, over 3866572.52 frames. ], batch size: 80, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:11:20,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-08-15 01:11:47,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2935050.0, ans=15.0 2024-08-15 01:12:19,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2935350.0, ans=0.1 2024-08-15 01:12:19,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3700, loss[loss=0.1025, beats_loss=0.009513, ecapa_loss=0.000137, whisper_loss=0.09159, over 14930.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.09032, over 3858338.12 frames. ], batch size: 58, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:12:37,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2935450.0, ans=0.125 2024-08-15 01:12:52,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-15 01:13:00,199 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 01:13:17,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.268e+01 2.455e+01 2.851e+01 9.066e+01, threshold=4.910e+01, percent-clipped=1.0 2024-08-15 01:13:34,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2935750.0, ans=0.2 2024-08-15 01:13:45,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3750, loss[loss=0.1262, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.1144, over 23270.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001524, whisper_loss=0.0909, over 3864635.09 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:13:52,188 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 01:13:54,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2024-08-15 01:14:08,454 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 01:14:17,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2935950.0, ans=0.1 2024-08-15 01:14:20,848 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.932e+00 2024-08-15 01:14:25,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2936050.0, ans=0.125 2024-08-15 01:14:38,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2936150.0, ans=0.125 2024-08-15 01:14:46,198 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 01:14:52,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2936150.0, ans=0.025 2024-08-15 01:15:11,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3800, loss[loss=0.1006, beats_loss=0.01211, ecapa_loss=0.0001309, whisper_loss=0.08718, over 20773.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.000152, whisper_loss=0.09034, over 3866583.26 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:15:34,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2936450.0, ans=0.2 2024-08-15 01:15:39,619 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 01:15:41,134 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 01:15:45,410 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 01:15:47,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2936550.0, ans=0.0 2024-08-15 01:15:59,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2936650.0, ans=0.1 2024-08-15 01:16:06,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.293e+01 2.527e+01 3.101e+01 3.900e+02, threshold=5.055e+01, percent-clipped=2.0 2024-08-15 01:16:33,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2936850.0, ans=0.125 2024-08-15 01:16:34,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3850, loss[loss=0.07615, beats_loss=0.01004, ecapa_loss=0.0001378, whisper_loss=0.06473, over 16900.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01079, ecapa_loss=0.0001526, whisper_loss=0.08982, over 3871109.31 frames. ], batch size: 65, lr: 2.99e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:16:48,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2936850.0, ans=0.2 2024-08-15 01:17:12,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2937050.0, ans=0.0 2024-08-15 01:17:22,362 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 01:17:28,833 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 01:17:55,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2937250.0, ans=0.125 2024-08-15 01:18:01,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3900, loss[loss=0.1028, beats_loss=0.01006, ecapa_loss=0.0001844, whisper_loss=0.09093, over 20979.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001544, whisper_loss=0.08977, over 3869405.28 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:18:15,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2937350.0, ans=0.1 2024-08-15 01:18:19,889 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 01:18:26,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2937450.0, ans=0.125 2024-08-15 01:18:41,929 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.171e+01 2024-08-15 01:18:45,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2937550.0, ans=0.125 2024-08-15 01:18:57,025 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 01:18:59,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.342e+01 2.596e+01 2.902e+01 5.795e+01, threshold=5.192e+01, percent-clipped=1.0 2024-08-15 01:19:12,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2937750.0, ans=0.0 2024-08-15 01:19:26,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2937850.0, ans=0.125 2024-08-15 01:19:27,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 3950, loss[loss=0.1158, beats_loss=0.01112, ecapa_loss=0.0001324, whisper_loss=0.1034, over 20756.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01077, ecapa_loss=0.0001538, whisper_loss=0.0896, over 3859824.86 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:19:59,684 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-15 01:19:59,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2937950.0, ans=0.125 2024-08-15 01:20:14,531 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 01:20:17,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2938050.0, ans=0.125 2024-08-15 01:20:32,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2938150.0, ans=0.125 2024-08-15 01:20:42,430 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 01:20:52,054 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 01:20:56,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4000, loss[loss=0.09386, beats_loss=0.0116, ecapa_loss=0.0001278, whisper_loss=0.08099, over 22507.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01081, ecapa_loss=0.0001548, whisper_loss=0.08927, over 3872788.61 frames. ], batch size: 88, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:21:18,245 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 01:21:30,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2938450.0, ans=0.125 2024-08-15 01:21:40,509 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 01:21:54,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2938650.0, ans=0.0 2024-08-15 01:21:58,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.427e+01 2.607e+01 2.957e+01 4.809e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-15 01:22:05,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2938650.0, ans=0.0 2024-08-15 01:22:11,875 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 01:22:26,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2938850.0, ans=0.1 2024-08-15 01:22:27,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4050, loss[loss=0.1162, beats_loss=0.01064, ecapa_loss=0.0001152, whisper_loss=0.1044, over 24766.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001546, whisper_loss=0.0902, over 3887362.96 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:22:42,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2938850.0, ans=0.125 2024-08-15 01:23:41,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2024-08-15 01:23:45,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2939250.0, ans=0.125 2024-08-15 01:23:49,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-15 01:23:58,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2939350.0, ans=0.035 2024-08-15 01:23:58,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2939350.0, ans=0.0 2024-08-15 01:23:58,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4100, loss[loss=0.09747, beats_loss=0.01285, ecapa_loss=0.0001101, whisper_loss=0.08352, over 20574.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.0001546, whisper_loss=0.09131, over 3914558.95 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:24:01,370 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 01:24:10,097 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 01:24:14,146 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 01:24:23,779 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 01:24:28,547 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-15 01:24:30,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2939450.0, ans=0.1 2024-08-15 01:24:35,539 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-15 01:24:36,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-15 01:24:40,676 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 01:24:52,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2939650.0, ans=0.125 2024-08-15 01:24:58,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.312e+01 2.568e+01 2.993e+01 3.291e+02, threshold=5.136e+01, percent-clipped=2.0 2024-08-15 01:25:04,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2939650.0, ans=0.1 2024-08-15 01:25:16,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2939750.0, ans=0.0 2024-08-15 01:25:26,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4150, loss[loss=0.137, beats_loss=0.006712, ecapa_loss=0.0001583, whisper_loss=0.1287, over 20159.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001543, whisper_loss=0.09113, over 3930872.77 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:25:28,387 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 01:25:29,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-15 01:25:29,822 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 01:25:55,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2939950.0, ans=0.125 2024-08-15 01:26:15,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2940050.0, ans=0.125 2024-08-15 01:26:24,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2940150.0, ans=0.2 2024-08-15 01:26:32,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2940150.0, ans=0.125 2024-08-15 01:26:33,935 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 01:26:38,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2024-08-15 01:26:52,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4200, loss[loss=0.1104, beats_loss=0.008095, ecapa_loss=0.0001641, whisper_loss=0.1007, over 15663.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01062, ecapa_loss=0.0001544, whisper_loss=0.09169, over 3937528.49 frames. ], batch size: 62, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:27:00,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-15 01:27:15,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2940450.0, ans=0.125 2024-08-15 01:27:17,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2024-08-15 01:27:27,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2024-08-15 01:27:33,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-15 01:27:44,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.316e+01 2.522e+01 2.873e+01 3.693e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 01:28:00,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2940750.0, ans=0.125 2024-08-15 01:28:00,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-15 01:28:06,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4250, loss[loss=0.1115, beats_loss=0.0104, ecapa_loss=0.0001312, whisper_loss=0.09977, over 23323.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001546, whisper_loss=0.0908, over 3917131.27 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:28:35,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=22.5 2024-08-15 01:28:37,543 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 01:28:45,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2941050.0, ans=0.125 2024-08-15 01:28:46,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2941150.0, ans=0.0 2024-08-15 01:28:52,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2941150.0, ans=0.2 2024-08-15 01:28:52,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2941150.0, ans=0.125 2024-08-15 01:28:59,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2941150.0, ans=15.0 2024-08-15 01:29:03,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-08-15 01:29:10,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2941250.0, ans=0.125 2024-08-15 01:29:13,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4300, loss[loss=0.09874, beats_loss=0.00935, ecapa_loss=0.000179, whisper_loss=0.0876, over 18301.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001546, whisper_loss=0.09059, over 3889419.95 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:29:55,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.47 vs. limit=22.5 2024-08-15 01:29:58,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.262e+01 2.467e+01 2.860e+01 4.963e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-15 01:30:17,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2941750.0, ans=0.125 2024-08-15 01:30:19,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4350, loss[loss=0.1076, beats_loss=0.01035, ecapa_loss=0.0001256, whisper_loss=0.09599, over 23814.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001548, whisper_loss=0.08995, over 3874317.37 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:30:21,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2941850.0, ans=0.07 2024-08-15 01:30:40,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-15 01:30:52,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2942050.0, ans=0.125 2024-08-15 01:30:56,655 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 01:31:14,212 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 01:31:16,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2942250.0, ans=0.125 2024-08-15 01:31:18,139 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 01:31:25,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4400, loss[loss=0.1196, beats_loss=0.008562, ecapa_loss=0.0001802, whisper_loss=0.1092, over 18342.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001542, whisper_loss=0.09021, over 3870835.88 frames. ], batch size: 73, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:31:53,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2942550.0, ans=0.125 2024-08-15 01:32:09,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.385e+01 2.562e+01 2.975e+01 4.289e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 01:32:14,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2942650.0, ans=0.0 2024-08-15 01:32:17,933 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 01:32:18,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2942750.0, ans=0.1 2024-08-15 01:32:20,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2942750.0, ans=0.5 2024-08-15 01:32:28,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2942750.0, ans=0.2 2024-08-15 01:32:30,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4450, loss[loss=0.1165, beats_loss=0.01056, ecapa_loss=0.0001631, whisper_loss=0.1043, over 18364.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001537, whisper_loss=0.09074, over 3879514.94 frames. ], batch size: 73, lr: 2.98e-03, grad_scale: 1.152921504606847e+18 2024-08-15 01:32:40,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2942850.0, ans=0.125 2024-08-15 01:32:46,157 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 01:32:55,445 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 01:32:55,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2943050.0, ans=0.125 2024-08-15 01:33:27,101 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 01:33:28,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2943250.0, ans=0.1 2024-08-15 01:33:36,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4500, loss[loss=0.1167, beats_loss=0.009137, ecapa_loss=0.0001275, whisper_loss=0.1063, over 19314.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001539, whisper_loss=0.0909, over 3855250.57 frames. ], batch size: 73, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:33:44,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-15 01:33:46,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2943350.0, ans=0.0 2024-08-15 01:34:07,408 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 01:34:07,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2943550.0, ans=0.1 2024-08-15 01:34:09,873 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 01:34:16,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2943650.0, ans=0.125 2024-08-15 01:34:23,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.368e+01 2.661e+01 3.187e+01 2.204e+02, threshold=5.323e+01, percent-clipped=1.0 2024-08-15 01:34:42,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4550, loss[loss=0.1173, beats_loss=0.009535, ecapa_loss=0.0001462, whisper_loss=0.1063, over 18705.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.000154, whisper_loss=0.09121, over 3875553.60 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:34:47,035 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 01:34:52,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2943850.0, ans=0.1 2024-08-15 01:35:04,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2943950.0, ans=0.05 2024-08-15 01:35:10,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2944050.0, ans=0.125 2024-08-15 01:35:10,901 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 01:35:11,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2944050.0, ans=0.0 2024-08-15 01:35:24,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2944150.0, ans=0.09899494936611666 2024-08-15 01:35:26,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2944150.0, ans=0.0 2024-08-15 01:35:31,455 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 01:35:31,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2944150.0, ans=0.125 2024-08-15 01:35:48,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4600, loss[loss=0.103, beats_loss=0.01281, ecapa_loss=0.000126, whisper_loss=0.08892, over 16586.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01063, ecapa_loss=0.0001539, whisper_loss=0.09051, over 3859480.34 frames. ], batch size: 65, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:35:51,386 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 01:35:55,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-15 01:36:04,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2944450.0, ans=0.125 2024-08-15 01:36:08,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2944450.0, ans=0.0 2024-08-15 01:36:11,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2944450.0, ans=0.5 2024-08-15 01:36:15,752 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 11 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 01:36:25,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2944550.0, ans=0.125 2024-08-15 01:36:29,145 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 01:36:33,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.310e+01 2.603e+01 2.915e+01 4.398e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 01:36:42,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2944750.0, ans=0.125 2024-08-15 01:36:49,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2944750.0, ans=0.1 2024-08-15 01:36:50,335 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 01:36:53,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4650, loss[loss=0.1059, beats_loss=0.01081, ecapa_loss=0.0001509, whisper_loss=0.09353, over 14422.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001533, whisper_loss=0.09045, over 3846843.93 frames. ], batch size: 56, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:37:07,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2944950.0, ans=0.0 2024-08-15 01:37:08,645 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 01:37:11,163 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 01:37:28,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2945050.0, ans=0.125 2024-08-15 01:37:32,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2945150.0, ans=0.125 2024-08-15 01:37:38,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2945150.0, ans=0.0 2024-08-15 01:37:42,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2945150.0, ans=0.0 2024-08-15 01:37:45,064 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-15 01:37:52,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2945250.0, ans=0.0 2024-08-15 01:37:54,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2945250.0, ans=0.1 2024-08-15 01:37:59,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4700, loss[loss=0.09239, beats_loss=0.012, ecapa_loss=0.0001466, whisper_loss=0.07893, over 21613.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001529, whisper_loss=0.09027, over 3846799.49 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:38:06,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-15 01:38:28,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2945550.0, ans=0.0 2024-08-15 01:38:29,481 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 01:38:33,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2945550.0, ans=0.1 2024-08-15 01:38:44,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.361e+01 2.586e+01 2.935e+01 3.925e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-15 01:38:50,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2945750.0, ans=0.1 2024-08-15 01:38:56,830 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 01:39:05,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4750, loss[loss=0.1069, beats_loss=0.01047, ecapa_loss=0.0001341, whisper_loss=0.09509, over 18692.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001538, whisper_loss=0.09, over 3848281.94 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:39:16,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2024-08-15 01:39:18,914 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 01:39:46,221 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:39:56,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2946150.0, ans=0.125 2024-08-15 01:40:02,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2946250.0, ans=0.2 2024-08-15 01:40:02,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2946250.0, ans=0.2 2024-08-15 01:40:14,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4800, loss[loss=0.1227, beats_loss=0.009576, ecapa_loss=0.0001634, whisper_loss=0.1115, over 22716.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001538, whisper_loss=0.09043, over 3877558.37 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:40:36,190 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-15 01:40:39,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2946450.0, ans=0.125 2024-08-15 01:41:07,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.219e+01 2.446e+01 2.733e+01 3.979e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-15 01:41:11,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2946650.0, ans=0.0 2024-08-15 01:41:30,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4850, loss[loss=0.08381, beats_loss=0.01306, ecapa_loss=0.0001366, whisper_loss=0.06939, over 19046.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001525, whisper_loss=0.09052, over 3916829.68 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:41:37,414 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 01:41:45,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2946950.0, ans=0.125 2024-08-15 01:41:45,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-15 01:41:47,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=12.0 2024-08-15 01:41:53,130 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 01:42:09,021 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 01:42:26,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2947150.0, ans=0.5 2024-08-15 01:42:33,997 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 34 from Vox, 39 fro AS 2024-08-15 01:42:37,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2947250.0, ans=0.0 2024-08-15 01:42:49,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4900, loss[loss=0.1315, beats_loss=0.008214, ecapa_loss=0.0001646, whisper_loss=0.1216, over 22763.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001531, whisper_loss=0.09029, over 3891419.69 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:43:17,230 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 01:43:24,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-15 01:43:28,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2947550.0, ans=0.2 2024-08-15 01:43:29,599 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 01:43:42,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.371e+01 2.641e+01 2.917e+01 5.290e+01, threshold=5.283e+01, percent-clipped=1.0 2024-08-15 01:43:47,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2947650.0, ans=0.2 2024-08-15 01:44:00,853 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 01:44:04,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 4950, loss[loss=0.1244, beats_loss=0.009602, ecapa_loss=0.0001957, whisper_loss=0.1129, over 22496.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01071, ecapa_loss=0.0001521, whisper_loss=0.09039, over 3867281.60 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:44:14,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2947850.0, ans=0.0 2024-08-15 01:44:14,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2947850.0, ans=0.125 2024-08-15 01:44:20,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=12.0 2024-08-15 01:44:26,812 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 01:44:57,784 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-15 01:45:10,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2024-08-15 01:45:13,205 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5000, loss[loss=0.1186, beats_loss=0.009968, ecapa_loss=0.0001459, whisper_loss=0.1072, over 24166.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01073, ecapa_loss=0.0001516, whisper_loss=0.0903, over 3866287.92 frames. ], batch size: 95, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:45:25,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2948450.0, ans=0.125 2024-08-15 01:45:34,278 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 01:45:34,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2948450.0, ans=0.5 2024-08-15 01:45:36,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2948450.0, ans=0.0 2024-08-15 01:45:47,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2948550.0, ans=0.1 2024-08-15 01:45:47,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2948550.0, ans=0.125 2024-08-15 01:45:58,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.280e+01 2.510e+01 2.795e+01 4.349e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-15 01:46:11,783 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 01:46:12,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2948750.0, ans=0.1 2024-08-15 01:46:14,167 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 01:46:17,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5050, loss[loss=0.08468, beats_loss=0.01125, ecapa_loss=0.000146, whisper_loss=0.07197, over 17659.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01078, ecapa_loss=0.0001518, whisper_loss=0.09023, over 3888574.55 frames. ], batch size: 69, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:46:23,327 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 01:46:24,601 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 01:46:34,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2948950.0, ans=0.1 2024-08-15 01:46:54,999 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 01:46:56,124 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 01:47:12,965 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 01:47:16,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2949250.0, ans=0.0 2024-08-15 01:47:17,700 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 01:47:23,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5100, loss[loss=0.06713, beats_loss=0.01291, ecapa_loss=0.0001509, whisper_loss=0.05272, over 17986.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001515, whisper_loss=0.09039, over 3869771.51 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:47:25,972 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 01:48:08,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.324e+01 2.645e+01 2.910e+01 4.236e+01, threshold=5.291e+01, percent-clipped=0.0 2024-08-15 01:48:12,714 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 01:48:22,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2949750.0, ans=0.125 2024-08-15 01:48:23,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2949750.0, ans=0.125 2024-08-15 01:48:28,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5150, loss[loss=0.1082, beats_loss=0.008704, ecapa_loss=0.0001447, whisper_loss=0.09808, over 20650.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001506, whisper_loss=0.09037, over 3864960.38 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:48:28,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2949850.0, ans=0.2 2024-08-15 01:48:51,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2949950.0, ans=0.0 2024-08-15 01:49:05,709 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-15 01:49:09,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2950150.0, ans=0.2 2024-08-15 01:49:10,883 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-15 01:49:14,821 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 01:49:15,113 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 01:49:33,251 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5200, loss[loss=0.09474, beats_loss=0.01028, ecapa_loss=0.0001475, whisper_loss=0.08298, over 15748.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001503, whisper_loss=0.08986, over 3843929.37 frames. ], batch size: 58, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:49:37,072 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 01:49:40,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2950350.0, ans=0.125 2024-08-15 01:49:42,408 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 01:49:44,800 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 01:49:48,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2950450.0, ans=0.2 2024-08-15 01:49:55,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2950450.0, ans=0.1 2024-08-15 01:50:03,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-15 01:50:09,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2950550.0, ans=0.125 2024-08-15 01:50:19,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.309e+01 2.539e+01 2.841e+01 2.667e+02, threshold=5.077e+01, percent-clipped=2.0 2024-08-15 01:50:26,963 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 01:50:28,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2950750.0, ans=0.0 2024-08-15 01:50:40,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5250, loss[loss=0.1088, beats_loss=0.01034, ecapa_loss=0.00017, whisper_loss=0.09679, over 19200.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001512, whisper_loss=0.09005, over 3861863.40 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:50:53,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2950950.0, ans=0.125 2024-08-15 01:50:54,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-08-15 01:51:12,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2951050.0, ans=0.125 2024-08-15 01:51:12,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2951050.0, ans=0.125 2024-08-15 01:51:13,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=12.0 2024-08-15 01:51:15,187 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 01:51:47,093 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 01:51:51,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5300, loss[loss=0.09863, beats_loss=0.009996, ecapa_loss=0.0001389, whisper_loss=0.08725, over 20126.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.09052, over 3874706.83 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:51:53,168 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.695e+05 2024-08-15 01:51:54,588 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 01:52:04,789 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 01:52:12,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2951450.0, ans=0.125 2024-08-15 01:52:43,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.269e+01 2.498e+01 2.805e+01 4.853e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-15 01:52:47,410 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 01:52:53,744 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 01:53:00,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2951750.0, ans=0.125 2024-08-15 01:53:02,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.29 vs. limit=6.0 2024-08-15 01:53:03,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2951750.0, ans=0.1 2024-08-15 01:53:03,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2951750.0, ans=0.125 2024-08-15 01:53:07,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5350, loss[loss=0.1136, beats_loss=0.009259, ecapa_loss=0.0001532, whisper_loss=0.1028, over 20002.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001513, whisper_loss=0.09036, over 3858458.37 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:53:16,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2951850.0, ans=0.0 2024-08-15 01:53:21,583 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 01:53:38,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2951950.0, ans=0.125 2024-08-15 01:53:45,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2952050.0, ans=0.2 2024-08-15 01:53:50,191 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 01:54:10,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2952250.0, ans=0.1 2024-08-15 01:54:14,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2952250.0, ans=0.025 2024-08-15 01:54:17,920 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 01:54:26,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5400, loss[loss=0.1041, beats_loss=0.009398, ecapa_loss=0.0001596, whisper_loss=0.0931, over 17806.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001517, whisper_loss=0.09061, over 3851270.81 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:54:34,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2952350.0, ans=0.125 2024-08-15 01:54:36,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2952350.0, ans=0.2 2024-08-15 01:54:44,553 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-15 01:54:52,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-15 01:55:02,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-08-15 01:55:18,439 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 01:55:19,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.348e+01 2.605e+01 2.891e+01 6.130e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-15 01:55:22,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2952650.0, ans=0.1 2024-08-15 01:55:25,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2952650.0, ans=0.125 2024-08-15 01:55:29,202 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.833e-02 2024-08-15 01:55:30,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2952750.0, ans=0.1 2024-08-15 01:55:44,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5450, loss[loss=0.1175, beats_loss=0.01141, ecapa_loss=0.0001298, whisper_loss=0.1048, over 24039.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.091, over 3899403.82 frames. ], batch size: 94, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:55:44,390 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 01:55:52,450 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 01:56:00,827 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 01:56:06,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2952950.0, ans=0.04949747468305833 2024-08-15 01:56:19,856 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 01:56:36,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2024-08-15 01:56:36,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2953150.0, ans=0.1 2024-08-15 01:56:43,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-08-15 01:56:45,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2953150.0, ans=0.0 2024-08-15 01:56:47,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2953150.0, ans=0.0 2024-08-15 01:56:53,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2953250.0, ans=0.0 2024-08-15 01:57:06,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5500, loss[loss=0.09987, beats_loss=0.0115, ecapa_loss=0.0001386, whisper_loss=0.08698, over 19650.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001513, whisper_loss=0.09105, over 3879190.51 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:57:14,168 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 01:57:41,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2953550.0, ans=0.125 2024-08-15 01:58:00,862 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.185e+01 2024-08-15 01:58:04,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.277e+01 2.522e+01 2.854e+01 1.046e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 01:58:27,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5550, loss[loss=0.09181, beats_loss=0.008779, ecapa_loss=0.0001822, whisper_loss=0.08121, over 17881.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001515, whisper_loss=0.09147, over 3895681.01 frames. ], batch size: 74, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 01:58:40,914 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 01:58:49,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2953950.0, ans=0.125 2024-08-15 01:58:50,088 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 16 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 01:58:52,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-15 01:59:01,384 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 01:59:22,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2954150.0, ans=0.125 2024-08-15 01:59:47,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5600, loss[loss=0.1007, beats_loss=0.01238, ecapa_loss=0.0001594, whisper_loss=0.08676, over 18440.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001518, whisper_loss=0.09183, over 3919146.96 frames. ], batch size: 77, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:00:03,725 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 02:00:10,625 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 02:00:20,257 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 02:00:37,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2954650.0, ans=15.0 2024-08-15 02:00:37,736 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 02:00:43,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.248e+01 2.477e+01 2.810e+01 7.862e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-15 02:00:45,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2954650.0, ans=0.0 2024-08-15 02:01:01,430 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 02:01:06,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5650, loss[loss=0.08292, beats_loss=0.01429, ecapa_loss=0.0001738, whisper_loss=0.06689, over 20979.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001532, whisper_loss=0.09141, over 3954885.05 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:01:14,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2954850.0, ans=0.0 2024-08-15 02:01:14,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=12.0 2024-08-15 02:01:25,229 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 02:01:31,057 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 02:01:35,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2955050.0, ans=0.125 2024-08-15 02:01:36,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2955050.0, ans=0.2 2024-08-15 02:02:18,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5700, loss[loss=0.1176, beats_loss=0.01143, ecapa_loss=0.0001308, whisper_loss=0.1049, over 23899.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0107, ecapa_loss=0.000152, whisper_loss=0.09062, over 3953410.02 frames. ], batch size: 93, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:02:18,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2955350.0, ans=0.125 2024-08-15 02:02:30,770 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 02:02:36,022 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 02:02:43,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2955450.0, ans=0.0 2024-08-15 02:02:59,648 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 02:03:06,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.504e+01 2.865e+01 3.291e+01 2.428e+02, threshold=5.731e+01, percent-clipped=5.0 2024-08-15 02:03:08,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2955650.0, ans=0.2 2024-08-15 02:03:10,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2955650.0, ans=0.0 2024-08-15 02:03:15,300 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 36 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 02:03:16,906 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 02:03:25,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2955750.0, ans=0.125 2024-08-15 02:03:25,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2955750.0, ans=0.125 2024-08-15 02:03:27,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5750, loss[loss=0.0781, beats_loss=0.01225, ecapa_loss=0.0001256, whisper_loss=0.06459, over 17619.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001529, whisper_loss=0.09073, over 3961351.88 frames. ], batch size: 71, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:03:29,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2955850.0, ans=0.0 2024-08-15 02:03:41,463 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 24 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-15 02:04:08,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2956150.0, ans=0.0 2024-08-15 02:04:14,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2956150.0, ans=0.1 2024-08-15 02:04:21,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2956250.0, ans=0.125 2024-08-15 02:04:22,394 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:04:30,298 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-15 02:04:33,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2956250.0, ans=0.125 2024-08-15 02:04:35,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5800, loss[loss=0.1228, beats_loss=0.009529, ecapa_loss=0.0001206, whisper_loss=0.112, over 16084.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001523, whisper_loss=0.09032, over 3923018.91 frames. ], batch size: 59, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:04:37,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2956350.0, ans=0.2 2024-08-15 02:04:42,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-08-15 02:04:46,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2956350.0, ans=0.0 2024-08-15 02:04:47,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2956450.0, ans=0.1 2024-08-15 02:05:05,650 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 02:05:18,847 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 02:05:25,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.342e+01 2.666e+01 2.998e+01 4.632e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-15 02:05:40,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-08-15 02:05:41,164 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 02:05:45,511 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5850, loss[loss=0.1076, beats_loss=0.008841, ecapa_loss=0.0001389, whisper_loss=0.09736, over 18057.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001522, whisper_loss=0.09, over 3932700.16 frames. ], batch size: 69, lr: 2.98e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:05:47,105 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 02:05:52,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2956850.0, ans=0.125 2024-08-15 02:05:55,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2956850.0, ans=0.125 2024-08-15 02:05:55,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2956850.0, ans=0.025 2024-08-15 02:05:59,070 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 02:05:59,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2024-08-15 02:06:00,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2956950.0, ans=0.1 2024-08-15 02:06:19,401 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-15 02:06:33,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2957150.0, ans=0.125 2024-08-15 02:06:45,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2957250.0, ans=0.125 2024-08-15 02:06:54,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2957250.0, ans=0.0 2024-08-15 02:06:59,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5900, loss[loss=0.08544, beats_loss=0.01195, ecapa_loss=0.0001419, whisper_loss=0.07207, over 22604.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001518, whisper_loss=0.08993, over 3909852.21 frames. ], batch size: 93, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:06:59,691 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-15 02:07:03,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2957350.0, ans=0.125 2024-08-15 02:07:15,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2957450.0, ans=0.125 2024-08-15 02:07:49,079 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.250e+01 2024-08-15 02:07:50,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957650.0, ans=0.1 2024-08-15 02:07:54,753 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.296e+01 2.523e+01 2.887e+01 4.052e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-15 02:08:00,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2957750.0, ans=0.1 2024-08-15 02:08:10,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2024-08-15 02:08:15,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 5950, loss[loss=0.08848, beats_loss=0.01439, ecapa_loss=0.0001614, whisper_loss=0.07248, over 15202.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001516, whisper_loss=0.08969, over 3894310.98 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:08:27,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2957950.0, ans=0.125 2024-08-15 02:08:37,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2957950.0, ans=0.125 2024-08-15 02:08:41,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-15 02:08:47,241 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 02:08:47,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2958050.0, ans=0.125 2024-08-15 02:08:57,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2958050.0, ans=0.125 2024-08-15 02:09:40,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6000, loss[loss=0.09243, beats_loss=0.01054, ecapa_loss=0.0001528, whisper_loss=0.08036, over 16692.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01078, ecapa_loss=0.0001525, whisper_loss=0.08966, over 3887356.30 frames. ], batch size: 70, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:09:40,776 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 02:10:24,954 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8445, 2.2371, 2.6296, 2.1816, 2.7954, 2.5489, 2.6746, 2.4398], device='cuda:3') 2024-08-15 02:10:48,019 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005526, whisper_loss=0.2479, over 922467.00 frames. 2024-08-15 02:11:14,071 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on SV_voxceleb1: loss=0.004315, beats_loss=0, ecapa_loss=0.0004315, whisper_loss=0, over 939242.00 frames. 2024-08-15 02:13:47,569 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9388, 3.3360, 2.3085, 3.6770], device='cuda:3') 2024-08-15 02:14:20,496 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on AT_audioset: loss=0.0235, beats_loss=0.0235, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 02:14:20,500 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 02:14:39,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2958450.0, ans=0.125 2024-08-15 02:14:47,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2958450.0, ans=0.125 2024-08-15 02:15:17,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.230e+01 2.610e+01 2.865e+01 2.775e+02, threshold=5.221e+01, percent-clipped=3.0 2024-08-15 02:15:28,550 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 02:15:33,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2958750.0, ans=0.2 2024-08-15 02:15:38,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6050, loss[loss=0.1194, beats_loss=0.01117, ecapa_loss=0.0001193, whisper_loss=0.1071, over 23720.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01077, ecapa_loss=0.0001518, whisper_loss=0.08996, over 3872169.75 frames. ], batch size: 87, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:15:47,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2958850.0, ans=0.1 2024-08-15 02:15:54,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-15 02:15:55,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2958950.0, ans=0.0 2024-08-15 02:15:56,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-15 02:16:04,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2959050.0, ans=0.2 2024-08-15 02:16:27,716 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 02:16:30,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-15 02:16:43,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6100, loss[loss=0.09902, beats_loss=0.00982, ecapa_loss=0.0001931, whisper_loss=0.08727, over 13237.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001524, whisper_loss=0.09005, over 3856645.03 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:16:58,398 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:16:59,704 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-15 02:17:10,061 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:17:19,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2959550.0, ans=0.1 2024-08-15 02:17:24,510 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 02:17:29,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.268e+01 2.472e+01 3.000e+01 1.337e+02, threshold=4.943e+01, percent-clipped=1.0 2024-08-15 02:17:43,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2959750.0, ans=0.1 2024-08-15 02:17:46,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2959750.0, ans=0.0 2024-08-15 02:17:49,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6150, loss[loss=0.1061, beats_loss=0.009442, ecapa_loss=0.0001544, whisper_loss=0.09512, over 22960.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.000153, whisper_loss=0.09018, over 3859426.60 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:17:49,831 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 02:17:54,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2959850.0, ans=0.2 2024-08-15 02:17:55,393 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.066e-01 2024-08-15 02:17:58,743 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09473193436861038, model_norm_threshold=49.43223190307617 2024-08-15 02:17:58,933 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.424e+04, grad_sumsq=2.424e+04, orig_rms_sq=1.000e+00 2024-08-15 02:18:00,378 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 02:18:01,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2959950.0, ans=0.1 2024-08-15 02:18:18,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2959950.0, ans=0.125 2024-08-15 02:18:22,391 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 02:18:35,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2024-08-15 02:18:41,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2960150.0, ans=0.125 2024-08-15 02:18:46,440 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 02:18:46,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2960250.0, ans=0.2 2024-08-15 02:18:56,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2960250.0, ans=0.1 2024-08-15 02:19:01,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6200, loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001052, whisper_loss=0.09033, over 23717.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001519, whisper_loss=0.09021, over 3849456.74 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:19:08,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2960350.0, ans=0.1 2024-08-15 02:19:40,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2960550.0, ans=0.5 2024-08-15 02:19:49,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.384e+01 2.613e+01 3.033e+01 5.218e+02, threshold=5.226e+01, percent-clipped=4.0 2024-08-15 02:20:06,678 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 02:20:09,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6250, loss[loss=0.1061, beats_loss=0.01116, ecapa_loss=0.0001917, whisper_loss=0.093, over 22316.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.000152, whisper_loss=0.09077, over 3830503.78 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:20:11,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-15 02:20:11,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2024-08-15 02:20:13,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2024-08-15 02:20:28,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2960950.0, ans=0.125 2024-08-15 02:20:40,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2961050.0, ans=10.0 2024-08-15 02:21:05,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2961250.0, ans=0.2 2024-08-15 02:21:10,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2961250.0, ans=0.1 2024-08-15 02:21:11,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=12.0 2024-08-15 02:21:17,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2961350.0, ans=0.125 2024-08-15 02:21:17,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6300, loss[loss=0.1157, beats_loss=0.009919, ecapa_loss=0.0001488, whisper_loss=0.1043, over 21764.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001529, whisper_loss=0.09099, over 3847936.31 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:21:18,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-15 02:21:19,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2961350.0, ans=0.125 2024-08-15 02:21:20,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2961350.0, ans=0.0 2024-08-15 02:21:23,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2961350.0, ans=0.1 2024-08-15 02:21:29,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-08-15 02:21:42,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=22.5 2024-08-15 02:21:55,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2961550.0, ans=0.125 2024-08-15 02:22:04,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.318e+01 2.564e+01 2.783e+01 4.377e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-15 02:22:23,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6350, loss[loss=0.1025, beats_loss=0.01016, ecapa_loss=0.0001511, whisper_loss=0.09081, over 18971.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001532, whisper_loss=0.09116, over 3843591.37 frames. ], batch size: 77, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:22:35,965 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 02:22:38,401 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 02:22:39,713 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 02:22:39,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2961950.0, ans=0.125 2024-08-15 02:22:42,494 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-15 02:22:53,560 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 02:22:56,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2962050.0, ans=0.125 2024-08-15 02:23:15,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2962250.0, ans=0.1 2024-08-15 02:23:17,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-15 02:23:19,202 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 02:23:29,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6400, loss[loss=0.0888, beats_loss=0.01348, ecapa_loss=0.0001835, whisper_loss=0.07348, over 17050.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001534, whisper_loss=0.09108, over 3843972.41 frames. ], batch size: 75, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:23:40,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2962450.0, ans=0.0 2024-08-15 02:23:43,385 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 02:23:49,115 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 02:23:59,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2024-08-15 02:24:10,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-08-15 02:24:14,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2962650.0, ans=0.1 2024-08-15 02:24:14,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+01 2.409e+01 2.750e+01 3.071e+01 4.179e+02, threshold=5.499e+01, percent-clipped=4.0 2024-08-15 02:24:25,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2962750.0, ans=0.125 2024-08-15 02:24:29,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2024-08-15 02:24:31,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2962750.0, ans=0.1 2024-08-15 02:24:32,371 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.985e-02 2024-08-15 02:24:34,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6450, loss[loss=0.1024, beats_loss=0.009701, ecapa_loss=0.0001684, whisper_loss=0.091, over 22974.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001516, whisper_loss=0.09187, over 3867525.21 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:24:38,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2962850.0, ans=0.125 2024-08-15 02:24:43,663 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 02:25:05,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=22.5 2024-08-15 02:25:05,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2024-08-15 02:25:06,568 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 02:25:30,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2963250.0, ans=0.2 2024-08-15 02:25:33,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=15.0 2024-08-15 02:25:40,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6500, loss[loss=0.1025, beats_loss=0.01156, ecapa_loss=0.0001654, whisper_loss=0.08934, over 21277.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001514, whisper_loss=0.09185, over 3871485.09 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:25:51,524 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 11 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 02:26:26,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.539e+01 2.761e+01 6.579e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-15 02:26:39,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-15 02:26:41,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2963750.0, ans=0.1 2024-08-15 02:26:46,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6550, loss[loss=0.109, beats_loss=0.008351, ecapa_loss=0.0001718, whisper_loss=0.09895, over 22036.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001521, whisper_loss=0.09113, over 3883865.40 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:26:53,359 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-15 02:26:59,623 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 02:27:16,603 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 02:27:20,187 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 02:27:24,057 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 02:27:30,535 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 02:27:50,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6600, loss[loss=0.1056, beats_loss=0.01091, ecapa_loss=0.0001454, whisper_loss=0.09327, over 17563.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001523, whisper_loss=0.09127, over 3903306.36 frames. ], batch size: 68, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:27:55,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2964350.0, ans=0.5 2024-08-15 02:28:13,762 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-15 02:28:15,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2964550.0, ans=0.0 2024-08-15 02:28:16,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2964550.0, ans=0.125 2024-08-15 02:28:17,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2024-08-15 02:28:25,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2964550.0, ans=0.2 2024-08-15 02:28:28,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2964650.0, ans=0.0 2024-08-15 02:28:31,028 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-15 02:28:33,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2964650.0, ans=0.2 2024-08-15 02:28:35,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.343e+01 2.654e+01 2.960e+01 4.414e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-15 02:28:39,768 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 02:28:51,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-15 02:28:55,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6650, loss[loss=0.1045, beats_loss=0.008861, ecapa_loss=0.0001955, whisper_loss=0.09373, over 21499.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.09142, over 3888601.33 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:28:56,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-15 02:28:58,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2964850.0, ans=0.125 2024-08-15 02:29:02,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2964850.0, ans=0.1 2024-08-15 02:29:16,965 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.887e-02 2024-08-15 02:29:42,449 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 02:29:46,063 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 02:30:01,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6700, loss[loss=0.09637, beats_loss=0.01194, ecapa_loss=0.0001155, whisper_loss=0.08328, over 20257.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01056, ecapa_loss=0.0001526, whisper_loss=0.09164, over 3902924.49 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 1.152921504606847e+18 2024-08-15 02:30:03,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2965350.0, ans=0.035 2024-08-15 02:30:05,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2965350.0, ans=0.0 2024-08-15 02:30:13,186 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 02:30:15,656 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 23 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-15 02:30:16,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2965450.0, ans=0.2 2024-08-15 02:30:17,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2965450.0, ans=0.125 2024-08-15 02:30:30,056 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 02:30:31,291 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 02:30:51,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.368e+01 2.669e+01 2.995e+01 9.040e+01, threshold=5.338e+01, percent-clipped=3.0 2024-08-15 02:30:52,015 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-15 02:31:04,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2965750.0, ans=0.125 2024-08-15 02:31:10,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2965750.0, ans=0.125 2024-08-15 02:31:13,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2965850.0, ans=0.125 2024-08-15 02:31:14,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6750, loss[loss=0.1075, beats_loss=0.0109, ecapa_loss=0.0001828, whisper_loss=0.09475, over 20596.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001538, whisper_loss=0.09097, over 3876970.63 frames. ], batch size: 87, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:31:25,636 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 02:31:43,903 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=12.0 2024-08-15 02:31:45,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2966050.0, ans=0.0 2024-08-15 02:32:17,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2966250.0, ans=0.2 2024-08-15 02:32:18,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2966250.0, ans=0.04949747468305833 2024-08-15 02:32:27,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2966250.0, ans=0.125 2024-08-15 02:32:29,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=15.0 2024-08-15 02:32:29,994 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6800, loss[loss=0.1085, beats_loss=0.008603, ecapa_loss=0.0001369, whisper_loss=0.09856, over 14802.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.09068, over 3875327.05 frames. ], batch size: 56, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:33:00,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2966550.0, ans=0.04949747468305833 2024-08-15 02:33:14,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2966650.0, ans=0.125 2024-08-15 02:33:17,322 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 02:33:19,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2966650.0, ans=0.125 2024-08-15 02:33:25,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.273e+01 2.576e+01 2.834e+01 4.792e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-15 02:33:28,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2966650.0, ans=0.1 2024-08-15 02:33:42,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2966750.0, ans=0.1 2024-08-15 02:33:42,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-15 02:33:43,190 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 02:33:46,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6850, loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.000145, whisper_loss=0.09058, over 16604.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.000153, whisper_loss=0.09035, over 3875195.16 frames. ], batch size: 65, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:33:50,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2966850.0, ans=0.05 2024-08-15 02:33:56,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2024-08-15 02:34:04,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2966950.0, ans=0.2 2024-08-15 02:34:11,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2024-08-15 02:34:19,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2967050.0, ans=0.125 2024-08-15 02:34:36,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2967150.0, ans=0.125 2024-08-15 02:34:42,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2967150.0, ans=0.1 2024-08-15 02:34:44,752 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 02:34:53,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2967250.0, ans=0.0 2024-08-15 02:35:00,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2967250.0, ans=0.07 2024-08-15 02:35:05,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6900, loss[loss=0.08917, beats_loss=0.01103, ecapa_loss=0.0001506, whisper_loss=0.07664, over 22910.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001521, whisper_loss=0.09066, over 3900993.64 frames. ], batch size: 93, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:35:22,035 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-15 02:36:02,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.306e+01 2.522e+01 2.757e+01 3.704e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-15 02:36:05,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2967650.0, ans=10.0 2024-08-15 02:36:24,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 6950, loss[loss=0.1317, beats_loss=0.01027, ecapa_loss=0.0001387, whisper_loss=0.12, over 19307.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01077, ecapa_loss=0.000151, whisper_loss=0.09058, over 3898253.98 frames. ], batch size: 75, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:36:24,475 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 02:36:33,918 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 02:36:40,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-15 02:37:08,723 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 02:37:25,657 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 02:37:40,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7000, loss[loss=0.1072, beats_loss=0.01009, ecapa_loss=0.0001831, whisper_loss=0.09526, over 20775.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001522, whisper_loss=0.09023, over 3874975.51 frames. ], batch size: 85, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:38:04,554 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 02:38:32,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2968650.0, ans=0.125 2024-08-15 02:38:38,369 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.236e+01 2.500e+01 2.764e+01 4.319e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-15 02:38:40,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2968650.0, ans=0.125 2024-08-15 02:38:41,667 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-15 02:38:52,620 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 02:39:00,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7050, loss[loss=0.1225, beats_loss=0.009419, ecapa_loss=0.0001509, whisper_loss=0.1116, over 18495.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001532, whisper_loss=0.09046, over 3874886.45 frames. ], batch size: 72, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:39:28,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2968950.0, ans=0.05 2024-08-15 02:39:40,461 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 02:39:53,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2969150.0, ans=0.125 2024-08-15 02:39:55,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2024-08-15 02:40:20,724 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7100, loss[loss=0.08578, beats_loss=0.0121, ecapa_loss=0.0001147, whisper_loss=0.07253, over 16998.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.09025, over 3915726.98 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:40:42,447 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-15 02:40:53,242 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 02:41:06,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=12.0 2024-08-15 02:41:11,845 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 02:41:17,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2024-08-15 02:41:19,175 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.324e+01 2.523e+01 2.719e+01 3.184e+02, threshold=5.045e+01, percent-clipped=4.0 2024-08-15 02:41:30,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2969750.0, ans=0.2 2024-08-15 02:41:42,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7150, loss[loss=0.08644, beats_loss=0.01093, ecapa_loss=0.0001549, whisper_loss=0.07395, over 15374.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001518, whisper_loss=0.09023, over 3894724.55 frames. ], batch size: 61, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:42:02,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2969950.0, ans=0.125 2024-08-15 02:42:06,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2969950.0, ans=0.0 2024-08-15 02:42:37,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2970150.0, ans=0.0 2024-08-15 02:42:40,792 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 02:42:48,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:42:51,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2970250.0, ans=0.05 2024-08-15 02:43:03,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7200, loss[loss=0.1165, beats_loss=0.009867, ecapa_loss=0.0001541, whisper_loss=0.1051, over 22818.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001516, whisper_loss=0.09063, over 3910119.48 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:43:27,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2970450.0, ans=0.0 2024-08-15 02:43:38,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-15 02:43:42,660 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-15 02:43:48,151 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 02:43:58,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2970650.0, ans=0.125 2024-08-15 02:44:03,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.341e+01 2.613e+01 2.912e+01 4.502e+01, threshold=5.226e+01, percent-clipped=0.0 2024-08-15 02:44:17,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-15 02:44:24,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7250, loss[loss=0.1113, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09949, over 23324.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001523, whisper_loss=0.09011, over 3938553.96 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:44:25,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2970850.0, ans=0.2 2024-08-15 02:44:26,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2970850.0, ans=0.2 2024-08-15 02:44:43,139 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 02:45:05,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2024-08-15 02:45:07,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2971050.0, ans=0.125 2024-08-15 02:45:08,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2971050.0, ans=0.125 2024-08-15 02:45:19,527 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 02:45:22,639 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 20 from LS+wenet, 33 from Vox, 42 fro AS 2024-08-15 02:45:28,902 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 02:45:36,108 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 02:45:40,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2971250.0, ans=0.0 2024-08-15 02:45:47,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7300, loss[loss=0.09005, beats_loss=0.01331, ecapa_loss=0.0001327, whisper_loss=0.07541, over 21160.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001516, whisper_loss=0.09041, over 3955829.15 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:45:57,533 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 02:46:01,185 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 02:46:07,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2971450.0, ans=0.0 2024-08-15 02:46:10,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2971450.0, ans=0.125 2024-08-15 02:46:16,770 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 02:46:18,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2971550.0, ans=0.2 2024-08-15 02:46:21,623 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 02:46:29,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2971550.0, ans=0.0 2024-08-15 02:46:31,406 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 02:46:33,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2971550.0, ans=0.125 2024-08-15 02:46:43,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-08-15 02:46:45,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2971650.0, ans=0.125 2024-08-15 02:46:46,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.342e+01 2.606e+01 2.963e+01 2.884e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 02:46:47,777 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 02:47:09,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7350, loss[loss=0.09689, beats_loss=0.01274, ecapa_loss=0.0001161, whisper_loss=0.08299, over 18035.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.0001516, whisper_loss=0.09077, over 3940572.35 frames. ], batch size: 69, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:47:18,070 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 02:47:21,255 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 02:47:21,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2971850.0, ans=0.125 2024-08-15 02:47:29,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2971950.0, ans=0.125 2024-08-15 02:47:30,383 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 02:47:35,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2971950.0, ans=0.2 2024-08-15 02:48:08,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2972150.0, ans=0.125 2024-08-15 02:48:12,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2972150.0, ans=0.0 2024-08-15 02:48:19,844 INFO [train_multi_KD3.py:844] (3/4) A total of 98 cuts. 33 from LS+wenet, 36 from Vox, 29 fro AS 2024-08-15 02:48:26,467 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 02:48:29,975 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 02:48:30,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2972250.0, ans=0.0 2024-08-15 02:48:32,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7400, loss[loss=0.1008, beats_loss=0.01182, ecapa_loss=0.0001317, whisper_loss=0.08761, over 17754.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01067, ecapa_loss=0.0001516, whisper_loss=0.09041, over 3923479.71 frames. ], batch size: 68, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:48:36,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2972350.0, ans=0.125 2024-08-15 02:49:10,100 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 02:49:15,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-08-15 02:49:30,636 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 02:49:31,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.322e+01 2.605e+01 2.983e+01 4.527e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 02:49:35,221 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 02:49:51,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2972750.0, ans=0.125 2024-08-15 02:49:51,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2972750.0, ans=0.125 2024-08-15 02:49:53,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2972850.0, ans=0.1 2024-08-15 02:49:53,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7450, loss[loss=0.09922, beats_loss=0.01253, ecapa_loss=0.0001264, whisper_loss=0.08543, over 22934.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001527, whisper_loss=0.09006, over 3906281.40 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:50:04,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2972850.0, ans=0.0 2024-08-15 02:50:12,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2972950.0, ans=0.2 2024-08-15 02:50:14,816 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 02:50:51,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2973150.0, ans=0.0 2024-08-15 02:51:16,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7500, loss[loss=0.07588, beats_loss=0.01245, ecapa_loss=0.0001149, whisper_loss=0.06227, over 14208.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01066, ecapa_loss=0.0001517, whisper_loss=0.09052, over 3914397.65 frames. ], batch size: 54, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:51:21,726 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 02:51:35,582 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 02:51:50,473 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 02:51:53,496 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 8 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 02:52:13,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-08-15 02:52:13,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2024-08-15 02:52:15,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.356e+01 2.622e+01 2.952e+01 4.347e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-15 02:52:21,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2973750.0, ans=0.125 2024-08-15 02:52:38,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7550, loss[loss=0.08493, beats_loss=0.01065, ecapa_loss=0.0001808, whisper_loss=0.07248, over 20136.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001525, whisper_loss=0.09024, over 3888364.08 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:52:45,369 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-15 02:52:58,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2973950.0, ans=0.2 2024-08-15 02:53:11,912 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 02:53:17,669 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 02:53:40,561 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 02:53:58,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7600, loss[loss=0.1206, beats_loss=0.0104, ecapa_loss=0.0001446, whisper_loss=0.1088, over 19590.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.09067, over 3907813.43 frames. ], batch size: 75, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:54:22,356 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 02:54:27,903 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 02:54:36,192 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 02:54:39,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2974550.0, ans=0.1 2024-08-15 02:54:41,884 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 15 from LS+wenet, 25 from Vox, 52 fro AS 2024-08-15 02:54:54,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2974650.0, ans=0.0 2024-08-15 02:54:55,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.311e+01 2.587e+01 3.162e+01 4.205e+02, threshold=5.175e+01, percent-clipped=3.0 2024-08-15 02:54:58,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2974650.0, ans=0.125 2024-08-15 02:55:14,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2974750.0, ans=0.0 2024-08-15 02:55:17,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2974850.0, ans=0.2 2024-08-15 02:55:17,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7650, loss[loss=0.09388, beats_loss=0.01192, ecapa_loss=0.000152, whisper_loss=0.08045, over 17813.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001535, whisper_loss=0.09086, over 3922722.45 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:55:35,309 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 02:56:02,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2975150.0, ans=0.0 2024-08-15 02:56:05,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2975150.0, ans=0.1 2024-08-15 02:56:09,429 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 02:56:15,870 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 02:56:24,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2975250.0, ans=0.07 2024-08-15 02:56:32,311 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 02:56:35,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7700, loss[loss=0.09825, beats_loss=0.01133, ecapa_loss=0.0001527, whisper_loss=0.0854, over 21393.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001528, whisper_loss=0.09063, over 3897913.86 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:56:58,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-15 02:57:03,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2975450.0, ans=0.0 2024-08-15 02:57:13,316 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 02:57:22,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2975650.0, ans=0.125 2024-08-15 02:57:25,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2975650.0, ans=0.125 2024-08-15 02:57:31,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.248e+01 2.489e+01 2.817e+01 2.674e+02, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 02:57:38,384 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 02:57:52,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7750, loss[loss=0.1047, beats_loss=0.01136, ecapa_loss=0.0001554, whisper_loss=0.09181, over 21942.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001519, whisper_loss=0.09047, over 3889599.06 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:57:53,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2975850.0, ans=0.125 2024-08-15 02:58:11,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.85 vs. limit=5.0 2024-08-15 02:58:23,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2976050.0, ans=0.125 2024-08-15 02:58:33,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2976050.0, ans=0.1 2024-08-15 02:58:38,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2976150.0, ans=0.1 2024-08-15 02:58:41,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2976150.0, ans=0.125 2024-08-15 02:58:45,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2976150.0, ans=0.1 2024-08-15 02:58:52,310 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 02:58:53,916 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 02:58:55,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2976250.0, ans=0.1 2024-08-15 02:59:01,180 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 02:59:09,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7800, loss[loss=0.1029, beats_loss=0.01002, ecapa_loss=0.000161, whisper_loss=0.09124, over 14667.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001518, whisper_loss=0.09042, over 3874004.41 frames. ], batch size: 57, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 02:59:14,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2976350.0, ans=0.0 2024-08-15 02:59:20,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2976350.0, ans=15.0 2024-08-15 02:59:57,395 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-15 02:59:59,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2976650.0, ans=0.0 2024-08-15 03:00:06,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.382e+01 2.617e+01 2.970e+01 1.321e+02, threshold=5.235e+01, percent-clipped=4.0 2024-08-15 03:00:08,719 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-15 03:00:29,956 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 03:00:30,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7850, loss[loss=0.09594, beats_loss=0.01125, ecapa_loss=0.0001603, whisper_loss=0.08309, over 22299.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001526, whisper_loss=0.09084, over 3898214.53 frames. ], batch size: 92, lr: 2.97e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:00:32,623 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 14 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 03:00:32,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2976850.0, ans=0.125 2024-08-15 03:00:34,689 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 03:00:40,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2976850.0, ans=0.0 2024-08-15 03:00:46,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2976950.0, ans=0.1 2024-08-15 03:00:49,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2024-08-15 03:00:50,352 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 03:00:52,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-08-15 03:00:56,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2976950.0, ans=0.125 2024-08-15 03:00:56,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2976950.0, ans=0.2 2024-08-15 03:00:58,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2976950.0, ans=0.0 2024-08-15 03:01:08,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2977050.0, ans=0.125 2024-08-15 03:01:20,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2977150.0, ans=0.125 2024-08-15 03:01:53,643 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7900, loss[loss=0.1274, beats_loss=0.007415, ecapa_loss=0.0001547, whisper_loss=0.1184, over 21916.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001519, whisper_loss=0.09136, over 3881114.25 frames. ], batch size: 83, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:01:54,114 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 03:02:05,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2977350.0, ans=0.125 2024-08-15 03:02:08,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-15 03:02:15,334 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-15 03:02:29,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2977550.0, ans=0.125 2024-08-15 03:02:50,370 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-15 03:02:53,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.325e+01 2.726e+01 3.089e+01 1.885e+02, threshold=5.452e+01, percent-clipped=1.0 2024-08-15 03:03:15,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 7950, loss[loss=0.106, beats_loss=0.01168, ecapa_loss=0.0001593, whisper_loss=0.09268, over 22831.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01073, ecapa_loss=0.0001515, whisper_loss=0.09095, over 3883070.02 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:03:24,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-15 03:03:35,015 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 03:03:46,566 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 03:03:53,159 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 03:03:57,567 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 03:04:16,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2978150.0, ans=0.125 2024-08-15 03:04:18,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2978150.0, ans=0.2 2024-08-15 03:04:29,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2978250.0, ans=0.2 2024-08-15 03:04:30,700 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 03:04:37,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8000, loss[loss=0.1033, beats_loss=0.01193, ecapa_loss=0.0001282, whisper_loss=0.09007, over 19178.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001526, whisper_loss=0.09112, over 3880369.78 frames. ], batch size: 76, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:04:38,941 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 03:04:59,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-15 03:05:02,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2978450.0, ans=0.1 2024-08-15 03:05:33,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2978650.0, ans=0.125 2024-08-15 03:05:35,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.399e+01 2.701e+01 3.155e+01 4.080e+02, threshold=5.401e+01, percent-clipped=3.0 2024-08-15 03:05:43,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2978750.0, ans=0.125 2024-08-15 03:05:54,834 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 03:05:56,371 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 03:05:57,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8050, loss[loss=0.09678, beats_loss=0.01256, ecapa_loss=0.0001212, whisper_loss=0.083, over 15323.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001529, whisper_loss=0.09089, over 3829296.38 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:05:58,628 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-15 03:06:33,246 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 03:06:36,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2979050.0, ans=0.125 2024-08-15 03:06:40,991 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 03:06:42,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2979050.0, ans=0.1 2024-08-15 03:06:45,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.06 vs. limit=22.5 2024-08-15 03:06:55,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2979150.0, ans=0.0 2024-08-15 03:06:56,907 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 03:06:57,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2979150.0, ans=0.125 2024-08-15 03:07:03,431 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-15 03:07:17,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8100, loss[loss=0.1196, beats_loss=0.007577, ecapa_loss=0.0002276, whisper_loss=0.1097, over 18509.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001535, whisper_loss=0.09093, over 3852238.41 frames. ], batch size: 78, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:07:18,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2979350.0, ans=0.1 2024-08-15 03:07:26,848 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-15 03:07:32,216 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 03:07:32,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2979350.0, ans=0.0 2024-08-15 03:07:34,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2979450.0, ans=0.09899494936611666 2024-08-15 03:07:42,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2979450.0, ans=0.125 2024-08-15 03:07:53,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2979550.0, ans=0.0 2024-08-15 03:08:05,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2979650.0, ans=0.0 2024-08-15 03:08:16,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.318e+01 2.516e+01 2.878e+01 5.938e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-15 03:08:21,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2979750.0, ans=0.125 2024-08-15 03:08:30,632 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-15 03:08:37,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2979850.0, ans=0.5 2024-08-15 03:08:38,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8150, loss[loss=0.1112, beats_loss=0.007354, ecapa_loss=0.0001784, whisper_loss=0.1021, over 16203.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001542, whisper_loss=0.0909, over 3868629.42 frames. ], batch size: 63, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:08:41,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2979850.0, ans=15.0 2024-08-15 03:09:08,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2979950.0, ans=0.1 2024-08-15 03:09:18,013 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 03:09:21,288 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 03:09:27,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-15 03:09:50,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2980250.0, ans=0.1 2024-08-15 03:10:00,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8200, loss[loss=0.119, beats_loss=0.009238, ecapa_loss=0.0001233, whisper_loss=0.1085, over 24568.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001536, whisper_loss=0.0907, over 3896803.95 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:10:01,136 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 03:10:45,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2980550.0, ans=0.0 2024-08-15 03:11:00,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.326e+01 2.553e+01 2.974e+01 4.367e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 03:11:11,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2980750.0, ans=0.125 2024-08-15 03:11:23,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8250, loss[loss=0.09596, beats_loss=0.01121, ecapa_loss=0.0001834, whisper_loss=0.08292, over 21016.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001533, whisper_loss=0.09054, over 3914625.96 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:11:26,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2980850.0, ans=0.0 2024-08-15 03:11:28,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2980850.0, ans=0.0 2024-08-15 03:11:42,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2980950.0, ans=0.125 2024-08-15 03:11:47,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2980950.0, ans=0.0 2024-08-15 03:11:53,353 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 03:12:03,073 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 03:12:10,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2981050.0, ans=0.2 2024-08-15 03:12:11,782 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 03:12:23,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2024-08-15 03:12:37,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2981250.0, ans=0.2 2024-08-15 03:12:39,348 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 03:12:46,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2981350.0, ans=0.2 2024-08-15 03:12:47,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8300, loss[loss=0.09872, beats_loss=0.01086, ecapa_loss=0.0001729, whisper_loss=0.08614, over 19184.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001525, whisper_loss=0.09022, over 3901687.12 frames. ], batch size: 77, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:13:11,063 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:13:19,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2981450.0, ans=0.0 2024-08-15 03:13:24,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2024-08-15 03:13:47,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=22.5 2024-08-15 03:13:47,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.355e+01 2.573e+01 2.835e+01 6.620e+01, threshold=5.146e+01, percent-clipped=1.0 2024-08-15 03:13:53,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2981750.0, ans=0.125 2024-08-15 03:14:09,293 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 03:14:11,155 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8350, loss[loss=0.09363, beats_loss=0.01276, ecapa_loss=0.0001417, whisper_loss=0.07946, over 22566.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01076, ecapa_loss=0.0001521, whisper_loss=0.08939, over 3896108.58 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:14:19,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-15 03:14:20,103 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 03:14:37,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-15 03:14:49,451 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 03:14:51,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2982050.0, ans=0.125 2024-08-15 03:14:53,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2982050.0, ans=0.0 2024-08-15 03:14:59,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-15 03:15:06,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-15 03:15:19,050 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 03:15:21,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2982250.0, ans=0.035 2024-08-15 03:15:28,122 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 03:15:34,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8400, loss[loss=0.1094, beats_loss=0.01431, ecapa_loss=0.0001071, whisper_loss=0.09398, over 19245.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001512, whisper_loss=0.0901, over 3931010.62 frames. ], batch size: 75, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:15:43,556 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 03:15:57,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2982450.0, ans=0.0 2024-08-15 03:16:12,814 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 03:16:23,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2982550.0, ans=0.0 2024-08-15 03:16:30,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2982650.0, ans=0.0 2024-08-15 03:16:32,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2982650.0, ans=0.125 2024-08-15 03:16:35,080 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 03:16:36,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.324e+01 2.482e+01 2.790e+01 5.297e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-15 03:16:47,623 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 03:17:02,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8450, loss[loss=0.08543, beats_loss=0.01129, ecapa_loss=0.0001559, whisper_loss=0.07257, over 15415.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001522, whisper_loss=0.09026, over 3928299.19 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:17:55,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=22.5 2024-08-15 03:17:58,170 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 03:18:01,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-08-15 03:18:04,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2024-08-15 03:18:06,892 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 03:18:12,385 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 03:18:23,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8500, loss[loss=0.135, beats_loss=0.009823, ecapa_loss=0.0001348, whisper_loss=0.1239, over 24901.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001527, whisper_loss=0.09014, over 3917213.63 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:19:00,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2983550.0, ans=0.0 2024-08-15 03:19:05,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2983550.0, ans=0.2 2024-08-15 03:19:09,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2983550.0, ans=0.0 2024-08-15 03:19:20,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2983650.0, ans=0.2 2024-08-15 03:19:21,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.305e+01 2.558e+01 2.940e+01 1.198e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-15 03:19:27,692 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 03:19:36,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2983750.0, ans=0.125 2024-08-15 03:19:37,406 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 03:19:46,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8550, loss[loss=0.09962, beats_loss=0.0124, ecapa_loss=0.0001154, whisper_loss=0.08606, over 16157.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001517, whisper_loss=0.09034, over 3931207.40 frames. ], batch size: 63, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:20:06,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2983950.0, ans=0.0 2024-08-15 03:20:35,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2984150.0, ans=0.125 2024-08-15 03:21:07,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8600, loss[loss=0.112, beats_loss=0.00854, ecapa_loss=0.0001674, whisper_loss=0.1018, over 18461.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001533, whisper_loss=0.09123, over 3915351.64 frames. ], batch size: 73, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:21:09,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-08-15 03:21:28,543 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 03:21:34,686 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 03:21:41,492 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-15 03:22:05,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2984650.0, ans=0.125 2024-08-15 03:22:06,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.403e+01 2.689e+01 2.856e+01 3.948e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-15 03:22:29,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8650, loss[loss=0.08905, beats_loss=0.01053, ecapa_loss=0.0001633, whisper_loss=0.07689, over 14328.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001538, whisper_loss=0.09103, over 3890157.51 frames. ], batch size: 58, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:22:46,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2984950.0, ans=0.1 2024-08-15 03:22:52,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2984950.0, ans=0.1 2024-08-15 03:22:54,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-08-15 03:22:58,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2984950.0, ans=0.125 2024-08-15 03:23:04,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2985050.0, ans=0.0 2024-08-15 03:23:13,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2985050.0, ans=0.0 2024-08-15 03:23:23,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-15 03:23:40,212 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-15 03:23:45,026 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 03:23:54,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2985350.0, ans=0.125 2024-08-15 03:23:55,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8700, loss[loss=0.08203, beats_loss=0.01069, ecapa_loss=0.0002045, whisper_loss=0.06929, over 15775.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001529, whisper_loss=0.09105, over 3895995.14 frames. ], batch size: 68, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:24:18,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2024-08-15 03:24:29,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2985450.0, ans=0.0 2024-08-15 03:24:55,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2024-08-15 03:25:04,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2985650.0, ans=0.025 2024-08-15 03:25:05,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.433e+01 2.657e+01 2.885e+01 1.161e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-15 03:25:07,542 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 26 from LS+wenet, 5 from Vox, 26 fro AS 2024-08-15 03:25:10,609 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 03:25:30,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8750, loss[loss=0.08924, beats_loss=0.01035, ecapa_loss=0.0001553, whisper_loss=0.07734, over 15653.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.000152, whisper_loss=0.09039, over 3885978.42 frames. ], batch size: 62, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:25:37,310 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 03:25:51,911 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 03:25:58,392 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 03:26:01,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2985950.0, ans=0.125 2024-08-15 03:26:21,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2986050.0, ans=0.0 2024-08-15 03:26:46,772 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 03:26:55,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2986250.0, ans=0.125 2024-08-15 03:27:02,089 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8800, loss[loss=0.1147, beats_loss=0.008288, ecapa_loss=0.0001539, whisper_loss=0.1049, over 20889.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001518, whisper_loss=0.09067, over 3917888.51 frames. ], batch size: 81, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:27:06,990 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 03:27:07,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2986350.0, ans=0.2 2024-08-15 03:27:16,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2986350.0, ans=0.2 2024-08-15 03:27:30,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2986450.0, ans=0.125 2024-08-15 03:27:51,563 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 03:27:58,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2986650.0, ans=0.125 2024-08-15 03:28:05,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.266e+01 2.513e+01 2.884e+01 4.202e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 03:28:17,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2986750.0, ans=0.0 2024-08-15 03:28:28,953 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8850, loss[loss=0.08161, beats_loss=0.01284, ecapa_loss=0.0001247, whisper_loss=0.06752, over 22552.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001523, whisper_loss=0.09087, over 3920637.33 frames. ], batch size: 91, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:28:48,275 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:29:05,341 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 03:29:16,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2987050.0, ans=0.2 2024-08-15 03:29:17,726 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 03:29:23,890 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 03:29:35,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-08-15 03:29:53,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.82 vs. limit=22.5 2024-08-15 03:29:56,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8900, loss[loss=0.09978, beats_loss=0.01147, ecapa_loss=0.0001662, whisper_loss=0.08666, over 18714.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001518, whisper_loss=0.09074, over 3900516.80 frames. ], batch size: 77, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:30:07,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2987350.0, ans=0.0 2024-08-15 03:30:10,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.35 vs. limit=22.5 2024-08-15 03:30:10,902 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 03:30:37,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2987550.0, ans=0.0 2024-08-15 03:31:01,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.296e+01 2.671e+01 2.935e+01 5.477e+01, threshold=5.343e+01, percent-clipped=1.0 2024-08-15 03:31:20,165 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 03:31:24,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2987750.0, ans=0.125 2024-08-15 03:31:27,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 8950, loss[loss=0.09082, beats_loss=0.009495, ecapa_loss=0.0001814, whisper_loss=0.07951, over 14521.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001527, whisper_loss=0.09103, over 3854114.28 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:32:08,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2988050.0, ans=0.035 2024-08-15 03:32:27,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2988150.0, ans=0.125 2024-08-15 03:32:27,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2988150.0, ans=0.125 2024-08-15 03:32:37,917 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 34 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 03:32:53,874 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 03:32:59,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9000, loss[loss=0.12, beats_loss=0.007449, ecapa_loss=0.0001772, whisper_loss=0.1108, over 15016.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001524, whisper_loss=0.09122, over 3852026.19 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 1.152921504606847e+18 2024-08-15 03:32:59,174 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 03:33:42,789 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005419, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 03:34:03,092 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on SV_voxceleb1: loss=0.004236, beats_loss=0, ecapa_loss=0.0004236, whisper_loss=0, over 939242.00 frames. 2024-08-15 03:35:55,455 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 03:35:55,459 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 03:35:57,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2988350.0, ans=0.125 2024-08-15 03:36:05,723 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 03:36:57,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2988650.0, ans=0.125 2024-08-15 03:37:00,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.331e+01 2.598e+01 2.772e+01 4.416e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 03:37:15,046 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 03:37:22,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9050, loss[loss=0.1095, beats_loss=0.009859, ecapa_loss=0.0001438, whisper_loss=0.09818, over 19648.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0105, ecapa_loss=0.0001532, whisper_loss=0.09196, over 3868364.08 frames. ], batch size: 78, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:37:24,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2988850.0, ans=0.125 2024-08-15 03:37:38,408 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-15 03:37:53,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2988950.0, ans=10.0 2024-08-15 03:38:00,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2989050.0, ans=0.125 2024-08-15 03:38:22,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2989150.0, ans=0.125 2024-08-15 03:38:22,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-08-15 03:38:28,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-08-15 03:38:43,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2989250.0, ans=0.05 2024-08-15 03:38:45,973 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 03:38:49,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2989250.0, ans=0.0 2024-08-15 03:38:56,701 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9100, loss[loss=0.08076, beats_loss=0.0107, ecapa_loss=0.0001052, whisper_loss=0.06901, over 14616.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001532, whisper_loss=0.09151, over 3866052.58 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:39:12,520 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 03:39:14,730 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 03:39:33,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2989450.0, ans=0.125 2024-08-15 03:40:10,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.390e+01 2.731e+01 3.078e+01 3.225e+02, threshold=5.461e+01, percent-clipped=2.0 2024-08-15 03:40:23,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2989750.0, ans=0.0 2024-08-15 03:40:33,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2989750.0, ans=10.0 2024-08-15 03:40:35,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9150, loss[loss=0.09844, beats_loss=0.01072, ecapa_loss=0.0001955, whisper_loss=0.08576, over 15560.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001538, whisper_loss=0.09146, over 3864459.67 frames. ], batch size: 67, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:40:39,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2989850.0, ans=0.1 2024-08-15 03:40:50,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2989850.0, ans=0.035 2024-08-15 03:40:59,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-15 03:41:17,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2990050.0, ans=0.125 2024-08-15 03:41:21,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2990050.0, ans=0.1 2024-08-15 03:41:39,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2990150.0, ans=0.125 2024-08-15 03:41:41,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-15 03:41:42,176 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 03:41:49,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-15 03:41:53,648 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 03:41:56,591 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 13 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 03:42:01,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2990250.0, ans=0.125 2024-08-15 03:42:04,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9200, loss[loss=0.07053, beats_loss=0.01139, ecapa_loss=0.0001511, whisper_loss=0.05763, over 14194.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001536, whisper_loss=0.09129, over 3832844.60 frames. ], batch size: 57, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:42:07,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2990350.0, ans=0.125 2024-08-15 03:42:12,173 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 03:42:32,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2990450.0, ans=0.2 2024-08-15 03:42:46,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2990550.0, ans=0.125 2024-08-15 03:42:52,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2990550.0, ans=0.2 2024-08-15 03:42:55,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2024-08-15 03:43:09,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2990650.0, ans=0.0 2024-08-15 03:43:12,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.340e+01 2.560e+01 2.896e+01 2.197e+02, threshold=5.119e+01, percent-clipped=4.0 2024-08-15 03:43:19,629 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 03:43:23,738 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 03:43:35,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9250, loss[loss=0.1204, beats_loss=0.00905, ecapa_loss=0.0001437, whisper_loss=0.1099, over 18571.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001537, whisper_loss=0.0908, over 3813748.82 frames. ], batch size: 71, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:43:47,730 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 03:43:52,994 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 6 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 03:44:07,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2990950.0, ans=0.2 2024-08-15 03:45:03,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=22.5 2024-08-15 03:45:08,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9300, loss[loss=0.1022, beats_loss=0.01125, ecapa_loss=0.0001547, whisper_loss=0.08943, over 23485.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001533, whisper_loss=0.08973, over 3855494.09 frames. ], batch size: 93, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:45:12,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2991350.0, ans=0.2 2024-08-15 03:45:13,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-15 03:45:27,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2024-08-15 03:45:30,252 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 03:45:31,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-15 03:45:38,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2991450.0, ans=0.0 2024-08-15 03:45:47,310 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.842e-02 2024-08-15 03:46:18,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.307e+01 2.589e+01 2.834e+01 3.793e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-15 03:46:28,586 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-15 03:46:32,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2991750.0, ans=0.125 2024-08-15 03:46:37,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2991750.0, ans=15.0 2024-08-15 03:46:42,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9350, loss[loss=0.1176, beats_loss=0.0103, ecapa_loss=0.0001029, whisper_loss=0.1063, over 21090.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001523, whisper_loss=0.09081, over 3873444.35 frames. ], batch size: 76, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:46:44,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2991850.0, ans=0.0 2024-08-15 03:46:45,775 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 03:46:58,533 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 03:47:15,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2024-08-15 03:47:32,514 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 03:47:33,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2992050.0, ans=0.125 2024-08-15 03:47:48,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2992150.0, ans=0.125 2024-08-15 03:47:54,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2992250.0, ans=0.1 2024-08-15 03:48:08,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9400, loss[loss=0.1139, beats_loss=0.008618, ecapa_loss=0.0001828, whisper_loss=0.1034, over 18497.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001535, whisper_loss=0.09144, over 3866653.11 frames. ], batch size: 76, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:48:18,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2992350.0, ans=0.2 2024-08-15 03:48:32,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2992450.0, ans=0.1 2024-08-15 03:48:43,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.02 vs. limit=10.0 2024-08-15 03:48:52,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2024-08-15 03:49:06,717 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 03:49:11,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.345e+01 2.543e+01 2.847e+01 7.002e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-15 03:49:18,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2992750.0, ans=0.125 2024-08-15 03:49:32,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9450, loss[loss=0.09198, beats_loss=0.01005, ecapa_loss=0.0001755, whisper_loss=0.08017, over 14623.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001537, whisper_loss=0.09085, over 3855501.76 frames. ], batch size: 59, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:49:33,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2992850.0, ans=0.125 2024-08-15 03:49:54,224 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-15 03:50:00,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 2024-08-15 03:50:03,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2992950.0, ans=15.0 2024-08-15 03:50:07,123 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 03:50:07,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2993050.0, ans=0.025 2024-08-15 03:50:10,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2993050.0, ans=0.0 2024-08-15 03:50:48,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2993250.0, ans=0.0 2024-08-15 03:50:50,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2993250.0, ans=0.05 2024-08-15 03:50:55,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2993250.0, ans=0.125 2024-08-15 03:50:58,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9500, loss[loss=0.09, beats_loss=0.01238, ecapa_loss=0.0001575, whisper_loss=0.07605, over 21994.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001533, whisper_loss=0.09121, over 3856655.96 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:51:09,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2993350.0, ans=0.125 2024-08-15 03:51:16,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2993450.0, ans=0.125 2024-08-15 03:51:35,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2993550.0, ans=0.125 2024-08-15 03:51:52,968 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 03:51:53,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2993650.0, ans=0.0 2024-08-15 03:51:57,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2993650.0, ans=0.125 2024-08-15 03:51:58,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2993650.0, ans=0.1 2024-08-15 03:52:04,962 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 03:52:05,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.283e+01 2.545e+01 2.911e+01 4.109e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 03:52:09,732 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-15 03:52:14,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-08-15 03:52:28,504 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 03:52:29,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9550, loss[loss=0.097, beats_loss=0.01095, ecapa_loss=0.0001859, whisper_loss=0.08419, over 21869.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001531, whisper_loss=0.09023, over 3848179.60 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:52:45,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2993850.0, ans=0.125 2024-08-15 03:52:46,106 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 03:52:53,157 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 03:53:30,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-15 03:53:31,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2994150.0, ans=0.0 2024-08-15 03:53:34,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2994150.0, ans=0.05 2024-08-15 03:53:37,790 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 03:53:44,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2994250.0, ans=0.2 2024-08-15 03:53:51,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2994250.0, ans=0.125 2024-08-15 03:53:51,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2994250.0, ans=0.5 2024-08-15 03:53:51,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2994250.0, ans=0.1 2024-08-15 03:53:58,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2994250.0, ans=0.125 2024-08-15 03:53:59,741 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 03:54:00,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9600, loss[loss=0.1125, beats_loss=0.01031, ecapa_loss=0.0001275, whisper_loss=0.1009, over 16331.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001532, whisper_loss=0.09042, over 3831494.22 frames. ], batch size: 64, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:54:12,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-15 03:55:13,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.340e+01 2.536e+01 2.906e+01 4.631e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 03:55:29,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2994750.0, ans=0.125 2024-08-15 03:55:29,975 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 03:55:31,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2994750.0, ans=0.125 2024-08-15 03:55:43,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9650, loss[loss=0.09973, beats_loss=0.01245, ecapa_loss=0.0001475, whisper_loss=0.0858, over 20109.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001536, whisper_loss=0.0903, over 3843387.44 frames. ], batch size: 84, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:55:47,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-08-15 03:56:02,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.14 vs. limit=22.5 2024-08-15 03:56:11,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2994950.0, ans=0.0 2024-08-15 03:56:13,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2994950.0, ans=0.125 2024-08-15 03:56:22,603 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.490e-02 2024-08-15 03:56:32,887 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 03:56:39,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2995050.0, ans=0.1 2024-08-15 03:57:20,006 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 03:57:29,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9700, loss[loss=0.09807, beats_loss=0.01246, ecapa_loss=0.00017, whisper_loss=0.08391, over 20887.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001535, whisper_loss=0.09085, over 3850447.40 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:57:31,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2995350.0, ans=0.1 2024-08-15 03:57:52,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2995350.0, ans=0.125 2024-08-15 03:58:13,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2995450.0, ans=0.125 2024-08-15 03:58:47,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2995550.0, ans=0.1 2024-08-15 03:58:51,967 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 03:58:54,249 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 03:59:10,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.634e+01 2.369e+01 2.652e+01 2.894e+01 3.989e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-15 03:59:13,208 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 03:59:46,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9750, loss[loss=0.1182, beats_loss=0.01016, ecapa_loss=0.0001699, whisper_loss=0.1063, over 23775.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001525, whisper_loss=0.0904, over 3861133.31 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 03:59:59,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-15 04:00:30,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2995950.0, ans=0.125 2024-08-15 04:00:42,155 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 04:00:44,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2024-08-15 04:00:49,315 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 04:00:57,599 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 04:01:06,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2996150.0, ans=0.125 2024-08-15 04:01:14,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-08-15 04:01:26,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2996250.0, ans=0.0 2024-08-15 04:01:53,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2996350.0, ans=0.1 2024-08-15 04:01:53,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9800, loss[loss=0.07305, beats_loss=0.01249, ecapa_loss=0.00012, whisper_loss=0.05936, over 14435.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001528, whisper_loss=0.09048, over 3872003.17 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:02:12,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2996350.0, ans=0.125 2024-08-15 04:02:25,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2996450.0, ans=0.05 2024-08-15 04:02:42,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2996450.0, ans=0.0 2024-08-15 04:03:10,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2996550.0, ans=0.125 2024-08-15 04:03:25,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.294e+01 2.579e+01 3.082e+01 3.957e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-15 04:03:28,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-15 04:03:30,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2996750.0, ans=0.1 2024-08-15 04:03:36,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2996750.0, ans=0.125 2024-08-15 04:03:45,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9850, loss[loss=0.08547, beats_loss=0.01353, ecapa_loss=0.0001458, whisper_loss=0.07049, over 21552.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001524, whisper_loss=0.0913, over 3862344.39 frames. ], batch size: 92, lr: 2.96e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:04:02,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2996950.0, ans=0.125 2024-08-15 04:04:03,656 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 04:04:09,326 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-15 04:04:21,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2997050.0, ans=0.0 2024-08-15 04:04:30,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2997050.0, ans=0.2 2024-08-15 04:04:42,334 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 04:04:44,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2997150.0, ans=0.0 2024-08-15 04:04:50,909 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 04:04:51,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2997150.0, ans=0.0 2024-08-15 04:04:58,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2024-08-15 04:05:01,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2997250.0, ans=0.125 2024-08-15 04:05:02,890 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 04:05:03,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2997250.0, ans=0.125 2024-08-15 04:05:11,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9900, loss[loss=0.1144, beats_loss=0.007669, ecapa_loss=0.0001801, whisper_loss=0.1049, over 18975.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001525, whisper_loss=0.09138, over 3881837.37 frames. ], batch size: 76, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:05:22,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2024-08-15 04:06:04,304 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 04:06:12,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.308e+01 2.598e+01 3.035e+01 4.044e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-15 04:06:34,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 9950, loss[loss=0.09564, beats_loss=0.01142, ecapa_loss=0.0001676, whisper_loss=0.08255, over 22189.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001536, whisper_loss=0.08999, over 3862261.26 frames. ], batch size: 94, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:06:41,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2997850.0, ans=0.125 2024-08-15 04:06:52,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2997950.0, ans=0.0 2024-08-15 04:07:01,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2997950.0, ans=0.0 2024-08-15 04:07:45,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2998250.0, ans=0.0 2024-08-15 04:07:51,285 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 04:07:51,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2998250.0, ans=0.0 2024-08-15 04:08:01,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10000, loss[loss=0.09843, beats_loss=0.00964, ecapa_loss=0.0001832, whisper_loss=0.08696, over 15357.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001535, whisper_loss=0.08964, over 3833941.08 frames. ], batch size: 65, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:08:02,963 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 04:08:11,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2998350.0, ans=0.125 2024-08-15 04:08:15,601 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-15 04:08:31,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2024-08-15 04:08:37,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=22.5 2024-08-15 04:08:37,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.33 vs. limit=10.0 2024-08-15 04:08:45,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2998550.0, ans=0.2 2024-08-15 04:09:02,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.337e+01 2.587e+01 2.892e+01 1.142e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-15 04:09:23,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10050, loss[loss=0.1162, beats_loss=0.009268, ecapa_loss=0.0001517, whisper_loss=0.1054, over 15767.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001536, whisper_loss=0.09053, over 3846014.90 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:09:34,294 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-15 04:09:47,164 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 04:09:49,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2998950.0, ans=0.0 2024-08-15 04:09:51,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2998950.0, ans=0.1 2024-08-15 04:09:53,790 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 04:10:09,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2999050.0, ans=0.125 2024-08-15 04:10:22,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2999150.0, ans=0.1 2024-08-15 04:10:29,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2999150.0, ans=0.125 2024-08-15 04:10:31,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-15 04:10:35,083 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 04:10:45,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2999250.0, ans=0.2 2024-08-15 04:10:47,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10100, loss[loss=0.08608, beats_loss=0.01155, ecapa_loss=0.0001205, whisper_loss=0.07333, over 15995.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01058, ecapa_loss=0.0001537, whisper_loss=0.09154, over 3895595.46 frames. ], batch size: 60, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:10:48,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-15 04:10:52,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2999350.0, ans=0.5 2024-08-15 04:11:09,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2999450.0, ans=0.2 2024-08-15 04:11:10,855 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 04:11:17,441 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 04:11:41,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2999650.0, ans=0.05 2024-08-15 04:11:43,411 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 04:11:43,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2999650.0, ans=0.5 2024-08-15 04:11:44,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.436e+01 2.693e+01 3.002e+01 5.180e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-15 04:12:05,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10150, loss[loss=0.0798, beats_loss=0.01252, ecapa_loss=0.000171, whisper_loss=0.06557, over 19523.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001553, whisper_loss=0.09114, over 3906349.63 frames. ], batch size: 84, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:12:06,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2999850.0, ans=0.2 2024-08-15 04:12:13,427 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 04:12:19,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2999950.0, ans=0.0 2024-08-15 04:12:38,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3000050.0, ans=0.0 2024-08-15 04:12:43,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3000050.0, ans=0.0 2024-08-15 04:12:48,944 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 04:12:58,336 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 04:13:00,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3000150.0, ans=0.125 2024-08-15 04:13:02,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3000150.0, ans=0.125 2024-08-15 04:13:08,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3000250.0, ans=0.2 2024-08-15 04:13:18,246 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 04:13:25,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10200, loss[loss=0.08214, beats_loss=0.01517, ecapa_loss=7.421e-05, whisper_loss=0.06622, over 14934.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001548, whisper_loss=0.09142, over 3885650.11 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:13:26,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3000350.0, ans=0.2 2024-08-15 04:13:43,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3000450.0, ans=0.2 2024-08-15 04:13:55,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3000450.0, ans=0.125 2024-08-15 04:14:09,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3000550.0, ans=0.0 2024-08-15 04:14:09,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3000550.0, ans=0.05 2024-08-15 04:14:18,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3000650.0, ans=0.125 2024-08-15 04:14:23,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.355e+01 2.535e+01 2.807e+01 5.755e+01, threshold=5.070e+01, percent-clipped=1.0 2024-08-15 04:14:25,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3000650.0, ans=0.2 2024-08-15 04:14:35,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3000750.0, ans=0.125 2024-08-15 04:14:40,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3000750.0, ans=0.125 2024-08-15 04:14:40,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3000750.0, ans=0.5 2024-08-15 04:14:43,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10250, loss[loss=0.08545, beats_loss=0.01116, ecapa_loss=0.0001709, whisper_loss=0.07257, over 16481.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01049, ecapa_loss=0.0001548, whisper_loss=0.09193, over 3898831.18 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:14:43,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3000850.0, ans=0.125 2024-08-15 04:14:54,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3000850.0, ans=0.2 2024-08-15 04:15:49,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-15 04:15:52,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3001250.0, ans=0.125 2024-08-15 04:16:00,692 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10300, loss[loss=0.1045, beats_loss=0.00887, ecapa_loss=0.0001449, whisper_loss=0.09416, over 21419.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01052, ecapa_loss=0.0001531, whisper_loss=0.09152, over 3886914.35 frames. ], batch size: 84, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:16:05,429 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 04:16:09,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-15 04:16:12,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2024-08-15 04:16:20,368 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-15 04:16:26,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3001450.0, ans=0.125 2024-08-15 04:16:32,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-08-15 04:16:57,127 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 04:16:59,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.405e+01 2.691e+01 3.048e+01 4.748e+01, threshold=5.382e+01, percent-clipped=0.0 2024-08-15 04:17:07,883 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.243e+01 2024-08-15 04:17:08,818 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 04:17:19,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10350, loss[loss=0.1082, beats_loss=0.008249, ecapa_loss=0.0001808, whisper_loss=0.09813, over 16557.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01055, ecapa_loss=0.0001539, whisper_loss=0.09154, over 3912779.99 frames. ], batch size: 68, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:17:36,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2024-08-15 04:17:40,986 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-15 04:18:11,481 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-15 04:18:12,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3002150.0, ans=0.125 2024-08-15 04:18:40,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10400, loss[loss=0.1087, beats_loss=0.01203, ecapa_loss=0.0001175, whisper_loss=0.09546, over 20782.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01059, ecapa_loss=0.0001527, whisper_loss=0.09141, over 3899776.13 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:18:47,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3002350.0, ans=0.0 2024-08-15 04:18:51,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3002350.0, ans=0.125 2024-08-15 04:18:59,007 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 04:19:03,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3002450.0, ans=0.1 2024-08-15 04:19:28,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3002650.0, ans=15.0 2024-08-15 04:19:34,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.336e+01 2.571e+01 2.760e+01 5.271e+01, threshold=5.142e+01, percent-clipped=0.0 2024-08-15 04:19:38,033 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 04:19:53,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2024-08-15 04:19:53,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10450, loss[loss=0.1079, beats_loss=0.01024, ecapa_loss=0.000164, whisper_loss=0.09598, over 15019.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001527, whisper_loss=0.09081, over 3881590.09 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:20:08,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3002950.0, ans=0.0 2024-08-15 04:20:32,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3003050.0, ans=0.04949747468305833 2024-08-15 04:21:04,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10500, loss[loss=0.11, beats_loss=0.009108, ecapa_loss=0.000165, whisper_loss=0.09927, over 14793.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001526, whisper_loss=0.09077, over 3900367.04 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:21:10,672 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 04:21:13,710 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 04:21:19,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3003450.0, ans=0.2 2024-08-15 04:21:19,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3003450.0, ans=0.125 2024-08-15 04:21:25,728 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 04:21:27,124 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 04:21:35,236 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 04:21:54,327 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.267e+01 2.471e+01 2.846e+01 8.765e+01, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 04:22:03,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-15 04:22:04,191 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 04:22:06,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3003750.0, ans=0.09899494936611666 2024-08-15 04:22:08,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3003750.0, ans=0.125 2024-08-15 04:22:12,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10550, loss[loss=0.09395, beats_loss=0.01162, ecapa_loss=0.0001429, whisper_loss=0.08091, over 21355.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001524, whisper_loss=0.08999, over 3883205.69 frames. ], batch size: 86, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:22:13,039 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 04:22:16,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-15 04:22:18,648 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 04:22:28,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3003950.0, ans=0.0 2024-08-15 04:22:58,663 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 04:23:01,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3004150.0, ans=0.1 2024-08-15 04:23:02,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3004150.0, ans=0.125 2024-08-15 04:23:09,147 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 04:23:21,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10600, loss[loss=0.1096, beats_loss=0.01005, ecapa_loss=0.0001745, whisper_loss=0.09777, over 21749.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0107, ecapa_loss=0.0001532, whisper_loss=0.08971, over 3882696.35 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:23:22,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3004350.0, ans=0.125 2024-08-15 04:23:37,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-15 04:23:46,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3004450.0, ans=0.04949747468305833 2024-08-15 04:23:54,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3004550.0, ans=0.125 2024-08-15 04:23:55,961 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 04:24:10,949 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 04:24:12,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.388e+01 2.629e+01 3.044e+01 4.366e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-15 04:24:14,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3004650.0, ans=0.0 2024-08-15 04:24:17,927 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 04:24:30,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10650, loss[loss=0.1102, beats_loss=0.0106, ecapa_loss=0.000152, whisper_loss=0.0981, over 21632.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001523, whisper_loss=0.08984, over 3859069.09 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:24:35,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3004850.0, ans=0.1 2024-08-15 04:24:39,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3004850.0, ans=0.125 2024-08-15 04:24:50,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-15 04:24:56,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3004950.0, ans=0.1 2024-08-15 04:24:58,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3004950.0, ans=0.0 2024-08-15 04:25:21,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3005150.0, ans=0.0 2024-08-15 04:25:30,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3005250.0, ans=0.125 2024-08-15 04:25:33,241 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 04:25:37,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3005250.0, ans=0.125 2024-08-15 04:25:37,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3005250.0, ans=0.0 2024-08-15 04:25:39,739 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 35 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 04:25:42,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-15 04:25:43,965 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10700, loss[loss=0.09054, beats_loss=0.0111, ecapa_loss=0.0001439, whisper_loss=0.07801, over 14617.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.000152, whisper_loss=0.0907, over 3870122.45 frames. ], batch size: 57, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:26:00,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3005450.0, ans=0.1 2024-08-15 04:26:14,769 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 04:26:19,995 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-15 04:26:22,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2024-08-15 04:26:39,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.322e+01 2.552e+01 2.947e+01 1.324e+02, threshold=5.105e+01, percent-clipped=0.0 2024-08-15 04:26:59,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10750, loss[loss=0.1048, beats_loss=0.01195, ecapa_loss=0.0001481, whisper_loss=0.09142, over 22900.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001504, whisper_loss=0.09046, over 3876863.06 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:27:21,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3005950.0, ans=0.125 2024-08-15 04:28:16,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10800, loss[loss=0.1182, beats_loss=0.009419, ecapa_loss=0.0001381, whisper_loss=0.1074, over 20705.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001508, whisper_loss=0.09076, over 3900260.39 frames. ], batch size: 81, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:28:21,696 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 04:28:26,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3006350.0, ans=0.125 2024-08-15 04:28:37,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3006450.0, ans=0.0 2024-08-15 04:29:12,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.426e+01 2.732e+01 3.113e+01 1.619e+02, threshold=5.464e+01, percent-clipped=2.0 2024-08-15 04:29:31,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10850, loss[loss=0.1152, beats_loss=0.009866, ecapa_loss=0.0001186, whisper_loss=0.1042, over 15999.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01069, ecapa_loss=0.0001504, whisper_loss=0.09122, over 3897688.63 frames. ], batch size: 61, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:29:34,646 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-15 04:29:44,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3006850.0, ans=0.015 2024-08-15 04:29:47,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3006950.0, ans=0.125 2024-08-15 04:30:03,178 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 04:30:26,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3007150.0, ans=0.1 2024-08-15 04:30:26,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3007150.0, ans=0.0 2024-08-15 04:30:44,010 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 04:30:47,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3007250.0, ans=0.125 2024-08-15 04:30:50,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10900, loss[loss=0.1035, beats_loss=0.01114, ecapa_loss=0.0001842, whisper_loss=0.09047, over 22252.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001518, whisper_loss=0.09111, over 3893450.58 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:31:01,159 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 04:31:03,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3007350.0, ans=0.09899494936611666 2024-08-15 04:31:21,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3007550.0, ans=0.07 2024-08-15 04:31:22,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=12.0 2024-08-15 04:31:27,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3007550.0, ans=0.1 2024-08-15 04:31:37,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.97 vs. limit=5.0 2024-08-15 04:31:47,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.315e+01 2.550e+01 2.913e+01 4.386e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-15 04:31:58,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3007750.0, ans=0.125 2024-08-15 04:32:06,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 10950, loss[loss=0.1113, beats_loss=0.01052, ecapa_loss=0.0002079, whisper_loss=0.09873, over 21801.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01059, ecapa_loss=0.0001518, whisper_loss=0.09204, over 3939253.82 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:32:24,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=8.0 2024-08-15 04:32:26,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3007950.0, ans=0.125 2024-08-15 04:32:33,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3007950.0, ans=0.125 2024-08-15 04:32:39,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-15 04:32:48,112 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 04:33:10,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3008250.0, ans=0.2 2024-08-15 04:33:12,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3008250.0, ans=0.0 2024-08-15 04:33:22,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11000, loss[loss=0.09498, beats_loss=0.00898, ecapa_loss=0.0001665, whisper_loss=0.08434, over 15772.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001525, whisper_loss=0.09132, over 3918186.50 frames. ], batch size: 64, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:33:34,558 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 04:33:45,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3008450.0, ans=0.125 2024-08-15 04:34:07,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3008650.0, ans=0.125 2024-08-15 04:34:12,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2024-08-15 04:34:17,771 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 04:34:19,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3008650.0, ans=0.04949747468305833 2024-08-15 04:34:20,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.435e+01 2.579e+01 2.993e+01 2.045e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-15 04:34:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3008750.0, ans=0.1 2024-08-15 04:34:25,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3008750.0, ans=0.015 2024-08-15 04:34:31,616 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 04:34:36,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3008850.0, ans=0.2 2024-08-15 04:34:37,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11050, loss[loss=0.07733, beats_loss=0.01265, ecapa_loss=0.0001682, whisper_loss=0.063, over 20964.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01054, ecapa_loss=0.0001524, whisper_loss=0.09194, over 3953170.33 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:34:44,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-15 04:34:45,522 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 04:34:53,131 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-15 04:34:58,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3008950.0, ans=0.125 2024-08-15 04:35:10,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3009050.0, ans=0.125 2024-08-15 04:35:11,276 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 04:35:20,756 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 04:35:32,504 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 04:35:40,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2024-08-15 04:35:52,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11100, loss[loss=0.1031, beats_loss=0.01019, ecapa_loss=0.0001393, whisper_loss=0.09149, over 22497.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01051, ecapa_loss=0.0001516, whisper_loss=0.09213, over 3946398.53 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:35:56,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3009350.0, ans=0.125 2024-08-15 04:36:01,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3009350.0, ans=0.125 2024-08-15 04:36:11,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3009450.0, ans=0.125 2024-08-15 04:36:15,437 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 04:36:24,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3009550.0, ans=0.2 2024-08-15 04:36:48,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.388e+01 2.670e+01 2.959e+01 6.163e+01, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 04:36:56,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3009750.0, ans=0.125 2024-08-15 04:37:07,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11150, loss[loss=0.1033, beats_loss=0.01073, ecapa_loss=0.0001746, whisper_loss=0.09086, over 21675.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01053, ecapa_loss=0.0001518, whisper_loss=0.09203, over 3960945.42 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:37:29,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-15 04:37:42,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3010050.0, ans=0.125 2024-08-15 04:37:45,322 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 04:37:48,108 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-15 04:38:03,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3010150.0, ans=0.125 2024-08-15 04:38:14,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3010250.0, ans=0.125 2024-08-15 04:38:17,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3010250.0, ans=0.0 2024-08-15 04:38:19,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11200, loss[loss=0.1198, beats_loss=0.01113, ecapa_loss=0.000161, whisper_loss=0.1071, over 18068.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001523, whisper_loss=0.0916, over 3925197.99 frames. ], batch size: 75, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:38:21,464 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 04:38:25,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3010350.0, ans=0.125 2024-08-15 04:39:14,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3010650.0, ans=0.125 2024-08-15 04:39:15,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.332e+01 2.561e+01 2.829e+01 4.358e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-15 04:39:33,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11250, loss[loss=0.1094, beats_loss=0.01177, ecapa_loss=0.0001262, whisper_loss=0.09639, over 21333.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01048, ecapa_loss=0.0001526, whisper_loss=0.09227, over 3930563.28 frames. ], batch size: 83, lr: 2.95e-03, grad_scale: 5.764607523034235e+17 2024-08-15 04:39:44,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3010850.0, ans=0.0 2024-08-15 04:40:17,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2024-08-15 04:40:21,464 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 04:40:42,599 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 04:40:50,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11300, loss[loss=0.09666, beats_loss=0.01229, ecapa_loss=0.0001477, whisper_loss=0.08289, over 21771.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01045, ecapa_loss=0.0001524, whisper_loss=0.09198, over 3932129.22 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:40:57,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=22.5 2024-08-15 04:41:03,486 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 04:41:10,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3011450.0, ans=0.125 2024-08-15 04:41:21,224 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 11 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 04:41:25,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3011550.0, ans=0.0 2024-08-15 04:41:30,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3011550.0, ans=0.0 2024-08-15 04:41:48,439 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 04:41:52,937 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.562e+01 2.942e+01 5.561e+01, threshold=5.125e+01, percent-clipped=1.0 2024-08-15 04:42:10,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11350, loss[loss=0.1124, beats_loss=0.01088, ecapa_loss=0.0001308, whisper_loss=0.1002, over 23666.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.09076, over 3889833.82 frames. ], batch size: 93, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:42:10,189 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 04:42:20,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3011850.0, ans=0.1 2024-08-15 04:42:27,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3011950.0, ans=0.125 2024-08-15 04:42:35,901 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 04:43:02,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3012150.0, ans=0.0 2024-08-15 04:43:15,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3012250.0, ans=0.0 2024-08-15 04:43:15,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3012250.0, ans=0.1 2024-08-15 04:43:17,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3012250.0, ans=0.0 2024-08-15 04:43:17,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2024-08-15 04:43:25,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11400, loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001625, whisper_loss=0.09095, over 14390.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001525, whisper_loss=0.09099, over 3876686.57 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:43:30,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3012350.0, ans=0.1 2024-08-15 04:43:42,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-15 04:44:11,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3012650.0, ans=0.125 2024-08-15 04:44:12,533 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 04:44:14,243 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 04:44:19,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3012650.0, ans=0.0 2024-08-15 04:44:22,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3012650.0, ans=0.0 2024-08-15 04:44:22,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.412e+01 2.712e+01 2.971e+01 3.918e+01, threshold=5.424e+01, percent-clipped=0.0 2024-08-15 04:44:22,972 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 04:44:39,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11450, loss[loss=0.09586, beats_loss=0.01169, ecapa_loss=0.0001573, whisper_loss=0.0826, over 22380.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001535, whisper_loss=0.09133, over 3885472.88 frames. ], batch size: 92, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:44:42,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3012850.0, ans=0.0 2024-08-15 04:44:46,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3012850.0, ans=0.125 2024-08-15 04:44:56,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3012950.0, ans=0.0 2024-08-15 04:45:04,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3012950.0, ans=0.0 2024-08-15 04:45:30,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3013150.0, ans=0.125 2024-08-15 04:45:36,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-08-15 04:45:41,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-15 04:45:49,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3013250.0, ans=0.0 2024-08-15 04:45:55,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11500, loss[loss=0.115, beats_loss=0.008653, ecapa_loss=0.0001668, whisper_loss=0.1047, over 17326.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01044, ecapa_loss=0.0001543, whisper_loss=0.09197, over 3865764.93 frames. ], batch size: 69, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:46:06,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-15 04:46:13,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3013450.0, ans=0.125 2024-08-15 04:46:17,805 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-15 04:46:31,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3013550.0, ans=0.2 2024-08-15 04:46:34,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3013550.0, ans=0.5 2024-08-15 04:46:45,226 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 04:46:51,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3013650.0, ans=0.07 2024-08-15 04:46:52,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.370e+01 2.550e+01 2.848e+01 7.027e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-15 04:46:59,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3013750.0, ans=0.125 2024-08-15 04:47:04,480 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 04:47:06,028 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 04:47:08,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11550, loss[loss=0.09308, beats_loss=0.01197, ecapa_loss=0.0001356, whisper_loss=0.07975, over 18226.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01044, ecapa_loss=0.0001553, whisper_loss=0.09215, over 3872524.79 frames. ], batch size: 74, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:47:29,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3013950.0, ans=0.125 2024-08-15 04:47:53,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3014050.0, ans=0.0 2024-08-15 04:48:29,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11600, loss[loss=0.1032, beats_loss=0.01027, ecapa_loss=0.0001659, whisper_loss=0.09123, over 21609.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001541, whisper_loss=0.0909, over 3862904.97 frames. ], batch size: 90, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:48:48,930 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.291e+01 2024-08-15 04:48:52,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-15 04:49:05,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3014550.0, ans=0.1 2024-08-15 04:49:10,527 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 04:49:17,556 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-15 04:49:32,698 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.366e+01 2.590e+01 2.931e+01 3.199e+02, threshold=5.179e+01, percent-clipped=2.0 2024-08-15 04:49:33,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3014750.0, ans=0.1 2024-08-15 04:49:49,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11650, loss[loss=0.1105, beats_loss=0.01049, ecapa_loss=0.0001245, whisper_loss=0.09874, over 18458.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001534, whisper_loss=0.09072, over 3850749.82 frames. ], batch size: 71, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:50:05,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-15 04:50:35,087 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 04:50:43,272 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 04:51:06,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11700, loss[loss=0.1187, beats_loss=0.006853, ecapa_loss=0.0002293, whisper_loss=0.1096, over 15427.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001543, whisper_loss=0.09129, over 3865001.75 frames. ], batch size: 67, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:51:14,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3015350.0, ans=0.0 2024-08-15 04:51:16,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3015350.0, ans=0.125 2024-08-15 04:51:24,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.64 vs. limit=10.0 2024-08-15 04:51:25,326 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 04:51:27,037 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 04:51:28,612 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 04:51:44,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3015550.0, ans=0.2 2024-08-15 04:51:58,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3015650.0, ans=0.125 2024-08-15 04:52:08,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.391e+01 2.584e+01 2.894e+01 1.234e+02, threshold=5.167e+01, percent-clipped=2.0 2024-08-15 04:52:20,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3015750.0, ans=0.125 2024-08-15 04:52:26,642 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11750, loss[loss=0.07446, beats_loss=0.01502, ecapa_loss=0.000131, whisper_loss=0.05813, over 18920.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01065, ecapa_loss=0.0001531, whisper_loss=0.09131, over 3885307.36 frames. ], batch size: 80, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:52:28,547 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 04:52:40,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3015850.0, ans=0.0 2024-08-15 04:52:43,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3015950.0, ans=0.125 2024-08-15 04:52:52,030 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 04:52:53,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-15 04:52:57,079 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 04:52:57,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3016050.0, ans=0.125 2024-08-15 04:53:00,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-15 04:53:16,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.88 vs. limit=5.0 2024-08-15 04:53:23,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=12.0 2024-08-15 04:53:33,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2024-08-15 04:53:47,777 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11800, loss[loss=0.08409, beats_loss=0.007309, ecapa_loss=0.0001582, whisper_loss=0.0752, over 16195.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01063, ecapa_loss=0.0001526, whisper_loss=0.09209, over 3888745.27 frames. ], batch size: 63, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:53:55,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-08-15 04:54:00,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3016350.0, ans=0.1 2024-08-15 04:54:21,992 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 04:54:32,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3016650.0, ans=0.0 2024-08-15 04:54:33,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3016650.0, ans=0.125 2024-08-15 04:54:34,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3016650.0, ans=0.125 2024-08-15 04:54:38,959 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 04:54:47,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.361e+01 2.696e+01 3.037e+01 7.582e+01, threshold=5.392e+01, percent-clipped=2.0 2024-08-15 04:54:56,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3016750.0, ans=0.2 2024-08-15 04:55:05,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11850, loss[loss=0.08886, beats_loss=0.01166, ecapa_loss=0.0001578, whisper_loss=0.07562, over 21968.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01067, ecapa_loss=0.000152, whisper_loss=0.09197, over 3906086.60 frames. ], batch size: 91, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:55:09,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3016850.0, ans=0.0 2024-08-15 04:55:21,629 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 04:55:42,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3017050.0, ans=0.0 2024-08-15 04:55:49,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3017150.0, ans=0.0 2024-08-15 04:55:59,280 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 04:56:02,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3017150.0, ans=0.0 2024-08-15 04:56:08,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3017250.0, ans=0.125 2024-08-15 04:56:20,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11900, loss[loss=0.103, beats_loss=0.009281, ecapa_loss=0.0002185, whisper_loss=0.0915, over 20443.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01067, ecapa_loss=0.0001525, whisper_loss=0.09246, over 3923357.74 frames. ], batch size: 87, lr: 2.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:56:33,040 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 04:57:00,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3017550.0, ans=0.0 2024-08-15 04:57:20,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.277e+01 2.486e+01 2.850e+01 3.770e+01, threshold=4.972e+01, percent-clipped=0.0 2024-08-15 04:57:27,633 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 04:57:35,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 11950, loss[loss=0.106, beats_loss=0.008688, ecapa_loss=0.0001961, whisper_loss=0.09537, over 14274.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001531, whisper_loss=0.09165, over 3888781.28 frames. ], batch size: 57, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:57:54,343 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 04:57:57,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3017950.0, ans=0.2 2024-08-15 04:57:59,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3017950.0, ans=0.0 2024-08-15 04:58:00,371 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 04:58:09,423 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 04:58:48,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12000, loss[loss=0.09732, beats_loss=0.0126, ecapa_loss=0.0001177, whisper_loss=0.08354, over 22795.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001519, whisper_loss=0.09058, over 3877903.61 frames. ], batch size: 90, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 04:58:48,166 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 04:59:32,723 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005394, whisper_loss=0.2473, over 922467.00 frames. 2024-08-15 04:59:53,029 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on SV_voxceleb1: loss=0.004335, beats_loss=0, ecapa_loss=0.0004335, whisper_loss=0, over 939242.00 frames. 2024-08-15 05:01:54,726 INFO [train_multi_KD3.py:1149] (3/4) Epoch 21, validation on AT_audioset: loss=0.02336, beats_loss=0.02336, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 05:01:54,737 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 05:02:33,288 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 05:02:34,834 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 05:02:47,342 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 05:02:49,081 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 05:02:50,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.329e+01 2.556e+01 2.882e+01 4.155e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 05:03:04,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12050, loss[loss=0.1138, beats_loss=0.009971, ecapa_loss=0.0001299, whisper_loss=0.1025, over 15848.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01076, ecapa_loss=0.0001514, whisper_loss=0.09016, over 3865092.88 frames. ], batch size: 61, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:03:18,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3018950.0, ans=0.0 2024-08-15 05:03:32,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=22.5 2024-08-15 05:03:34,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3019050.0, ans=0.5 2024-08-15 05:03:41,809 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 05:03:45,940 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 05:03:52,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3019150.0, ans=0.125 2024-08-15 05:03:54,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3019150.0, ans=0.0 2024-08-15 05:03:54,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3019150.0, ans=0.125 2024-08-15 05:03:58,249 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 05:04:05,501 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 05:04:13,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12100, loss[loss=0.08764, beats_loss=0.01046, ecapa_loss=0.0001719, whisper_loss=0.07546, over 22852.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01074, ecapa_loss=0.0001525, whisper_loss=0.09005, over 3880826.40 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:04:19,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3019350.0, ans=0.125 2024-08-15 05:04:24,857 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:04:33,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3019450.0, ans=0.125 2024-08-15 05:04:33,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3019450.0, ans=0.0 2024-08-15 05:04:37,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019450.0, ans=0.1 2024-08-15 05:04:39,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3019450.0, ans=0.125 2024-08-15 05:04:41,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3019550.0, ans=0.1 2024-08-15 05:04:41,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3019550.0, ans=0.125 2024-08-15 05:04:46,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-15 05:05:09,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.350e+01 2.548e+01 2.785e+01 3.671e+01, threshold=5.096e+01, percent-clipped=0.0 2024-08-15 05:05:14,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3019750.0, ans=0.125 2024-08-15 05:05:26,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12150, loss[loss=0.09816, beats_loss=0.01056, ecapa_loss=0.0001309, whisper_loss=0.08629, over 15970.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01071, ecapa_loss=0.0001531, whisper_loss=0.09027, over 3896905.45 frames. ], batch size: 62, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:05:28,675 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-15 05:05:32,058 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 05:05:33,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3019850.0, ans=0.125 2024-08-15 05:05:39,423 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 05:05:45,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3019950.0, ans=0.125 2024-08-15 05:05:54,634 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 05:06:20,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3020150.0, ans=0.125 2024-08-15 05:06:46,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12200, loss[loss=0.1014, beats_loss=0.01123, ecapa_loss=0.0001529, whisper_loss=0.08868, over 21697.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001513, whisper_loss=0.09021, over 3873116.44 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:07:02,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3020450.0, ans=0.1 2024-08-15 05:07:04,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-08-15 05:07:06,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2024-08-15 05:07:06,769 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 05:07:07,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3020450.0, ans=0.125 2024-08-15 05:07:19,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3020550.0, ans=0.07 2024-08-15 05:07:45,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.310e+01 2.623e+01 3.026e+01 6.571e+01, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 05:07:54,280 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 05:08:03,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12250, loss[loss=0.07924, beats_loss=0.01316, ecapa_loss=0.0001503, whisper_loss=0.06458, over 20019.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001524, whisper_loss=0.09048, over 3915827.45 frames. ], batch size: 83, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:08:09,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3020850.0, ans=0.1 2024-08-15 05:08:19,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3020950.0, ans=0.0 2024-08-15 05:08:20,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=12.0 2024-08-15 05:08:54,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-15 05:08:57,049 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 05:09:20,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12300, loss[loss=0.08935, beats_loss=0.009807, ecapa_loss=0.000155, whisper_loss=0.078, over 23730.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001532, whisper_loss=0.09072, over 3895419.01 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:09:35,023 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 05:09:58,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.48 vs. limit=22.5 2024-08-15 05:10:17,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3021650.0, ans=0.0 2024-08-15 05:10:21,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3021650.0, ans=0.0 2024-08-15 05:10:22,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.390e+01 2.646e+01 2.944e+01 2.237e+02, threshold=5.293e+01, percent-clipped=1.0 2024-08-15 05:10:38,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12350, loss[loss=0.1091, beats_loss=0.009279, ecapa_loss=0.000136, whisper_loss=0.09843, over 14943.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001541, whisper_loss=0.09032, over 3865339.27 frames. ], batch size: 55, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:10:38,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3021850.0, ans=0.2 2024-08-15 05:10:46,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3021850.0, ans=0.0 2024-08-15 05:10:59,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3021950.0, ans=0.0 2024-08-15 05:11:15,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3022050.0, ans=0.0 2024-08-15 05:11:23,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3022150.0, ans=0.0 2024-08-15 05:11:27,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3022150.0, ans=0.0 2024-08-15 05:11:30,918 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 05:11:33,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-15 05:11:35,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-15 05:11:50,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12400, loss[loss=0.1089, beats_loss=0.009367, ecapa_loss=0.0001626, whisper_loss=0.09792, over 16514.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001547, whisper_loss=0.09026, over 3828109.78 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:11:54,744 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 05:12:00,110 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 05:12:00,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3022350.0, ans=0.0 2024-08-15 05:12:13,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3022450.0, ans=0.0 2024-08-15 05:12:33,924 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 05:12:42,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.329e+01 2.587e+01 2.851e+01 3.829e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 05:12:52,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-08-15 05:12:54,949 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-15 05:12:57,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=22.5 2024-08-15 05:12:58,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12450, loss[loss=0.1081, beats_loss=0.009759, ecapa_loss=0.0001209, whisper_loss=0.09714, over 17824.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001536, whisper_loss=0.0894, over 3814189.34 frames. ], batch size: 65, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:13:29,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3023050.0, ans=0.125 2024-08-15 05:13:36,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3023050.0, ans=0.1 2024-08-15 05:13:37,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3023150.0, ans=0.125 2024-08-15 05:13:43,034 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:13:52,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3023250.0, ans=0.125 2024-08-15 05:13:57,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-08-15 05:13:58,387 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 05:14:04,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12500, loss[loss=0.1162, beats_loss=0.009671, ecapa_loss=0.0001613, whisper_loss=0.1049, over 16486.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001526, whisper_loss=0.08999, over 3864749.36 frames. ], batch size: 66, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:14:13,101 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-15 05:14:17,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3023450.0, ans=0.07 2024-08-15 05:14:23,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3023450.0, ans=0.0 2024-08-15 05:14:36,026 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 05:14:43,025 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 05:14:57,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.569e+01 2.941e+01 3.163e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-15 05:15:07,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3023750.0, ans=0.2 2024-08-15 05:15:09,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3023750.0, ans=0.0 2024-08-15 05:15:09,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3023750.0, ans=0.125 2024-08-15 05:15:12,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12550, loss[loss=0.09579, beats_loss=0.01205, ecapa_loss=0.0001501, whisper_loss=0.08223, over 23677.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001528, whisper_loss=0.0904, over 3873374.55 frames. ], batch size: 95, lr: 2.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 05:15:20,928 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 05:15:27,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3023950.0, ans=0.125 2024-08-15 05:15:33,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-15 05:15:43,618 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 05:15:44,955 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 05:15:53,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3024150.0, ans=0.125 2024-08-15 05:15:56,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-15 05:15:56,850 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 05:16:10,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3024250.0, ans=0.0 2024-08-15 05:16:10,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=12.0 2024-08-15 05:16:12,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3024250.0, ans=0.07 2024-08-15 05:16:17,567 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 05:16:20,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12600, loss[loss=0.1175, beats_loss=0.008466, ecapa_loss=0.0001854, whisper_loss=0.1072, over 15588.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.09105, over 3840298.13 frames. ], batch size: 63, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:16:32,393 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-15 05:16:42,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3024450.0, ans=0.125 2024-08-15 05:16:45,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3024450.0, ans=0.2 2024-08-15 05:16:46,184 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 05:16:47,484 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 05:16:47,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3024550.0, ans=0.2 2024-08-15 05:16:50,148 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 05:16:50,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3024550.0, ans=0.125 2024-08-15 05:16:55,118 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 05:17:01,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-15 05:17:13,052 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.071e-02 2024-08-15 05:17:13,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.257e+01 2.680e+01 2.970e+01 2.910e+02, threshold=5.361e+01, percent-clipped=1.0 2024-08-15 05:17:27,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12650, loss[loss=0.1053, beats_loss=0.01117, ecapa_loss=0.0001494, whisper_loss=0.09259, over 22096.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001512, whisper_loss=0.09015, over 3870689.94 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:17:33,687 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 18 from LS+wenet, 35 from Vox, 40 fro AS 2024-08-15 05:17:39,321 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 05:17:51,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3024950.0, ans=0.125 2024-08-15 05:17:53,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=15.0 2024-08-15 05:17:55,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3025050.0, ans=0.0 2024-08-15 05:17:58,027 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 05:18:11,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3025150.0, ans=0.125 2024-08-15 05:18:30,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3025250.0, ans=0.125 2024-08-15 05:18:33,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12700, loss[loss=0.09962, beats_loss=0.01227, ecapa_loss=0.0001398, whisper_loss=0.08596, over 21835.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001499, whisper_loss=0.09072, over 3886792.15 frames. ], batch size: 89, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:19:14,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.80 vs. limit=22.5 2024-08-15 05:19:26,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.352e+01 2.609e+01 2.982e+01 1.854e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-15 05:19:28,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2024-08-15 05:19:29,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3025750.0, ans=0.125 2024-08-15 05:19:33,627 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 21 from LS+wenet, 34 from Vox, 40 fro AS 2024-08-15 05:19:39,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12750, loss[loss=0.08555, beats_loss=0.01225, ecapa_loss=0.0001516, whisper_loss=0.07178, over 16868.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01075, ecapa_loss=0.000152, whisper_loss=0.09106, over 3868490.20 frames. ], batch size: 69, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:19:41,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3025850.0, ans=0.125 2024-08-15 05:20:11,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3026050.0, ans=0.1 2024-08-15 05:20:24,891 WARNING [optim.py:496] (3/4) Scaling gradients by 0.023750245571136475, model_norm_threshold=52.18341064453125 2024-08-15 05:20:25,068 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.496e+05, grad_sumsq=7.496e+05, orig_rms_sq=1.000e+00 2024-08-15 05:20:29,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3026150.0, ans=0.0 2024-08-15 05:20:31,693 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 05:20:45,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12800, loss[loss=0.1128, beats_loss=0.01124, ecapa_loss=0.000155, whisper_loss=0.1, over 22144.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01085, ecapa_loss=0.0001526, whisper_loss=0.09051, over 3893672.62 frames. ], batch size: 87, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:28,683 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 05:21:31,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3026650.0, ans=15.0 2024-08-15 05:21:35,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3026650.0, ans=0.0 2024-08-15 05:21:35,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3026650.0, ans=0.125 2024-08-15 05:21:39,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.369e+01 2.658e+01 2.978e+01 2.197e+03, threshold=5.317e+01, percent-clipped=3.0 2024-08-15 05:21:45,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-15 05:21:51,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3026850.0, ans=0.04949747468305833 2024-08-15 05:21:52,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12850, loss[loss=0.1084, beats_loss=0.01076, ecapa_loss=0.0001519, whisper_loss=0.09614, over 21207.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01084, ecapa_loss=0.000152, whisper_loss=0.09075, over 3888157.83 frames. ], batch size: 86, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:21:58,355 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 05:22:02,562 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 05:22:02,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3026850.0, ans=0.0 2024-08-15 05:22:21,207 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 05:22:30,393 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 05:22:30,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3027050.0, ans=0.125 2024-08-15 05:22:42,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3027150.0, ans=0.2 2024-08-15 05:22:49,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3027250.0, ans=0.07 2024-08-15 05:22:59,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12900, loss[loss=0.1111, beats_loss=0.01021, ecapa_loss=0.0001599, whisper_loss=0.09926, over 19631.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01085, ecapa_loss=0.0001522, whisper_loss=0.0901, over 3858764.52 frames. ], batch size: 77, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:23:07,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3027350.0, ans=0.0 2024-08-15 05:23:18,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=12.0 2024-08-15 05:23:21,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3027450.0, ans=0.0 2024-08-15 05:23:36,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-15 05:23:39,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3027650.0, ans=0.0 2024-08-15 05:23:46,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-08-15 05:23:53,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.303e+01 2.501e+01 2.765e+01 4.358e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-15 05:24:03,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3027750.0, ans=0.0 2024-08-15 05:24:06,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 12950, loss[loss=0.09894, beats_loss=0.01282, ecapa_loss=0.0001234, whisper_loss=0.08489, over 17178.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01074, ecapa_loss=0.0001536, whisper_loss=0.09058, over 3855179.15 frames. ], batch size: 67, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:24:12,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3027850.0, ans=0.025 2024-08-15 05:24:15,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3027850.0, ans=0.2 2024-08-15 05:24:23,234 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 05:24:29,226 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 10 from Vox, 48 fro AS 2024-08-15 05:24:34,500 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 05:24:44,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3028050.0, ans=0.0 2024-08-15 05:24:56,577 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 05:24:58,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3028150.0, ans=0.1 2024-08-15 05:25:13,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13000, loss[loss=0.09325, beats_loss=0.01019, ecapa_loss=0.0001645, whisper_loss=0.08142, over 16494.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01074, ecapa_loss=0.0001525, whisper_loss=0.08989, over 3863112.98 frames. ], batch size: 65, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:25:22,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-15 05:25:36,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3028450.0, ans=0.125 2024-08-15 05:25:37,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3028450.0, ans=0.2 2024-08-15 05:25:46,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-15 05:25:53,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3028650.0, ans=0.2 2024-08-15 05:26:07,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.449e+01 2.686e+01 3.098e+01 1.940e+02, threshold=5.373e+01, percent-clipped=2.0 2024-08-15 05:26:21,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13050, loss[loss=0.07413, beats_loss=0.01333, ecapa_loss=0.0001261, whisper_loss=0.05953, over 17079.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01077, ecapa_loss=0.0001516, whisper_loss=0.08979, over 3830734.29 frames. ], batch size: 69, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:26:34,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3028950.0, ans=0.125 2024-08-15 05:26:43,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3028950.0, ans=0.125 2024-08-15 05:26:58,988 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 05:27:10,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3029150.0, ans=0.05 2024-08-15 05:27:19,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-08-15 05:27:33,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13100, loss[loss=0.103, beats_loss=0.00792, ecapa_loss=0.0001709, whisper_loss=0.09339, over 19715.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001514, whisper_loss=0.09014, over 3870536.75 frames. ], batch size: 80, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:27:40,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3029350.0, ans=0.1 2024-08-15 05:27:42,939 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 05:27:49,800 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-15 05:28:33,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3029650.0, ans=0.1 2024-08-15 05:28:36,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.367e+01 2.624e+01 3.031e+01 1.630e+02, threshold=5.247e+01, percent-clipped=4.0 2024-08-15 05:28:50,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13150, loss[loss=0.1013, beats_loss=0.01157, ecapa_loss=0.0001476, whisper_loss=0.08823, over 23092.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01081, ecapa_loss=0.0001507, whisper_loss=0.09003, over 3878390.22 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:28:56,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.47 vs. limit=22.5 2024-08-15 05:29:00,567 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 05:29:09,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3029950.0, ans=0.125 2024-08-15 05:29:23,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-15 05:29:42,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3030150.0, ans=0.0 2024-08-15 05:29:44,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3030150.0, ans=0.1 2024-08-15 05:29:51,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2024-08-15 05:29:52,220 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 05:29:53,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3030250.0, ans=0.0 2024-08-15 05:29:59,367 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 05:29:59,731 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.919e-03 2024-08-15 05:30:02,781 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:30:07,876 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 05:30:09,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13200, loss[loss=0.1058, beats_loss=0.01103, ecapa_loss=0.0001323, whisper_loss=0.09346, over 23682.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001518, whisper_loss=0.09056, over 3884125.37 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:30:12,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-15 05:30:27,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3030450.0, ans=0.125 2024-08-15 05:30:30,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3030450.0, ans=0.0 2024-08-15 05:30:39,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3030550.0, ans=0.125 2024-08-15 05:31:03,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3030650.0, ans=0.125 2024-08-15 05:31:11,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.273e+01 2.515e+01 2.855e+01 4.648e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-15 05:31:11,756 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 05:31:19,287 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 05:31:26,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13250, loss[loss=0.1044, beats_loss=0.009338, ecapa_loss=0.0001788, whisper_loss=0.09322, over 20024.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001522, whisper_loss=0.09101, over 3884774.64 frames. ], batch size: 82, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:31:42,522 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 05:31:52,222 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-15 05:31:59,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3031050.0, ans=0.125 2024-08-15 05:32:09,085 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 05:32:11,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-08-15 05:32:22,618 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 05:32:24,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3031150.0, ans=0.0 2024-08-15 05:32:27,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3031250.0, ans=0.125 2024-08-15 05:32:37,951 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 05:32:41,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3031350.0, ans=0.125 2024-08-15 05:32:41,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13300, loss[loss=0.09731, beats_loss=0.009334, ecapa_loss=0.0001339, whisper_loss=0.08663, over 19079.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.0001524, whisper_loss=0.09089, over 3884011.49 frames. ], batch size: 72, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:33:05,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3031450.0, ans=0.0 2024-08-15 05:33:41,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.389e+01 2.602e+01 2.951e+01 3.808e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-15 05:33:41,402 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-15 05:33:55,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13350, loss[loss=0.07897, beats_loss=0.009511, ecapa_loss=0.0001617, whisper_loss=0.06784, over 15009.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001513, whisper_loss=0.0908, over 3904316.02 frames. ], batch size: 59, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:34:22,809 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 05:34:27,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3032050.0, ans=0.125 2024-08-15 05:34:36,885 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 05:34:40,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3032150.0, ans=0.0 2024-08-15 05:34:45,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3032150.0, ans=0.09899494936611666 2024-08-15 05:34:48,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3032150.0, ans=0.09899494936611666 2024-08-15 05:35:06,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13400, loss[loss=0.1198, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.1081, over 17775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.09059, over 3876009.23 frames. ], batch size: 70, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:35:10,280 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-15 05:35:10,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-15 05:35:14,314 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 05:35:19,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3032450.0, ans=0.2 2024-08-15 05:35:39,951 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 05:35:40,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3032550.0, ans=0.025 2024-08-15 05:35:40,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-15 05:35:51,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-15 05:36:02,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.320e+01 2.582e+01 2.828e+01 6.062e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 05:36:12,034 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 05:36:18,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13450, loss[loss=0.1311, beats_loss=0.01019, ecapa_loss=0.000137, whisper_loss=0.1195, over 22692.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.0001505, whisper_loss=0.09107, over 3868368.96 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:36:23,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3032850.0, ans=0.035 2024-08-15 05:36:25,604 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 05:36:30,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3032850.0, ans=0.1 2024-08-15 05:36:33,713 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 05:36:40,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.49 vs. limit=10.0 2024-08-15 05:36:55,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3033050.0, ans=0.125 2024-08-15 05:37:03,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3033150.0, ans=0.1 2024-08-15 05:37:06,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3033150.0, ans=0.125 2024-08-15 05:37:14,533 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 05:37:21,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-15 05:37:22,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=22.5 2024-08-15 05:37:24,813 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 05:37:30,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13500, loss[loss=0.09721, beats_loss=0.0111, ecapa_loss=0.0001839, whisper_loss=0.08427, over 21805.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001509, whisper_loss=0.09053, over 3865935.89 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:37:44,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-15 05:37:45,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3033450.0, ans=0.1 2024-08-15 05:37:48,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-15 05:38:18,975 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 05:38:19,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3033650.0, ans=0.2 2024-08-15 05:38:24,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3033650.0, ans=0.0 2024-08-15 05:38:26,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.315e+01 2.561e+01 2.861e+01 3.892e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-15 05:38:41,131 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13550, loss[loss=0.1125, beats_loss=0.01257, ecapa_loss=0.0001523, whisper_loss=0.09839, over 23267.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001496, whisper_loss=0.0908, over 3838490.01 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:38:43,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3033850.0, ans=0.0 2024-08-15 05:38:44,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-15 05:38:57,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3033950.0, ans=0.0 2024-08-15 05:39:03,983 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 05:39:06,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3033950.0, ans=0.125 2024-08-15 05:39:14,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3034050.0, ans=0.125 2024-08-15 05:39:35,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3034150.0, ans=0.1 2024-08-15 05:39:38,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3034250.0, ans=0.07 2024-08-15 05:39:42,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3034250.0, ans=0.1 2024-08-15 05:39:43,370 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 05:39:44,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3034250.0, ans=0.125 2024-08-15 05:39:53,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13600, loss[loss=0.07301, beats_loss=0.01358, ecapa_loss=0.0001252, whisper_loss=0.05818, over 21227.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01077, ecapa_loss=0.0001498, whisper_loss=0.09053, over 3847276.58 frames. ], batch size: 88, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:40:07,665 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 13 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 05:40:36,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3034550.0, ans=0.125 2024-08-15 05:40:37,408 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 05:40:43,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3034650.0, ans=0.1 2024-08-15 05:40:44,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-15 05:40:53,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.277e+01 2.545e+01 2.819e+01 3.866e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 05:41:08,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13650, loss[loss=0.08604, beats_loss=0.01142, ecapa_loss=0.0001658, whisper_loss=0.07296, over 21004.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01083, ecapa_loss=0.0001502, whisper_loss=0.09014, over 3834306.70 frames. ], batch size: 93, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:41:34,564 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-15 05:41:45,030 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 05:41:46,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3035050.0, ans=0.125 2024-08-15 05:41:49,137 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 05:41:49,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3035050.0, ans=0.2 2024-08-15 05:41:49,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-15 05:41:53,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 05:42:22,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13700, loss[loss=0.09957, beats_loss=0.009406, ecapa_loss=0.0001367, whisper_loss=0.0888, over 19212.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01078, ecapa_loss=0.0001508, whisper_loss=0.09073, over 3850590.51 frames. ], batch size: 70, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:42:33,254 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 05:42:51,409 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 05:42:51,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3035550.0, ans=0.0 2024-08-15 05:42:53,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3035550.0, ans=0.0 2024-08-15 05:42:54,409 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 7 from Vox, 29 fro AS 2024-08-15 05:42:59,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3035550.0, ans=0.125 2024-08-15 05:43:14,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3035650.0, ans=0.0 2024-08-15 05:43:23,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.259e+01 2.458e+01 2.754e+01 9.155e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-15 05:43:38,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13750, loss[loss=0.1056, beats_loss=0.01215, ecapa_loss=0.0001344, whisper_loss=0.09206, over 22900.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01081, ecapa_loss=0.0001499, whisper_loss=0.09078, over 3863722.52 frames. ], batch size: 91, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:43:44,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3035850.0, ans=0.125 2024-08-15 05:43:49,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3035850.0, ans=0.125 2024-08-15 05:43:52,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3035850.0, ans=0.0 2024-08-15 05:44:02,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-15 05:44:03,300 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 34 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-15 05:44:59,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3036350.0, ans=0.125 2024-08-15 05:45:00,475 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13800, loss[loss=0.1066, beats_loss=0.008632, ecapa_loss=0.0001119, whisper_loss=0.09684, over 16358.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001494, whisper_loss=0.09173, over 3861669.28 frames. ], batch size: 58, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:45:17,251 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 05:45:23,699 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 05:45:44,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2024-08-15 05:45:48,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3036650.0, ans=0.125 2024-08-15 05:45:48,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-15 05:45:50,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-08-15 05:46:05,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.299e+01 2.505e+01 2.770e+01 3.939e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-15 05:46:07,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3036750.0, ans=0.2 2024-08-15 05:46:12,036 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 05:46:18,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3036750.0, ans=0.125 2024-08-15 05:46:19,470 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 05:46:22,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13850, loss[loss=0.1132, beats_loss=0.008389, ecapa_loss=0.0001355, whisper_loss=0.1035, over 18664.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01068, ecapa_loss=0.0001495, whisper_loss=0.09125, over 3852354.00 frames. ], batch size: 68, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:46:27,899 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 05:46:44,036 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 05:46:53,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3037050.0, ans=0.0 2024-08-15 05:47:13,303 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-15 05:47:38,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3037250.0, ans=0.0 2024-08-15 05:47:42,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13900, loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.000142, whisper_loss=0.09091, over 22022.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0106, ecapa_loss=0.0001506, whisper_loss=0.09206, over 3868471.65 frames. ], batch size: 92, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:47:46,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3037350.0, ans=0.125 2024-08-15 05:47:47,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3037350.0, ans=10.0 2024-08-15 05:47:55,271 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 05:48:02,558 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-15 05:48:21,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3037550.0, ans=0.0 2024-08-15 05:48:46,454 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 05:48:48,005 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-15 05:48:49,087 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05059259384870529, model_norm_threshold=50.10878372192383 2024-08-15 05:48:49,274 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.601e+05, grad_sumsq=3.601e+05, orig_rms_sq=1.000e+00 2024-08-15 05:48:51,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.379e+01 2.607e+01 2.936e+01 9.904e+02, threshold=5.213e+01, percent-clipped=4.0 2024-08-15 05:49:02,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3037750.0, ans=0.0 2024-08-15 05:49:04,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-15 05:49:06,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 13950, loss[loss=0.09646, beats_loss=0.01514, ecapa_loss=0.0001225, whisper_loss=0.0801, over 23403.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01065, ecapa_loss=0.0001497, whisper_loss=0.09202, over 3887160.15 frames. ], batch size: 94, lr: 2.94e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:49:19,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3037850.0, ans=0.2 2024-08-15 05:49:41,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3038050.0, ans=0.0 2024-08-15 05:49:48,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2024-08-15 05:49:49,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3038050.0, ans=0.125 2024-08-15 05:50:21,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-15 05:50:22,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3038250.0, ans=0.0 2024-08-15 05:50:30,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3038250.0, ans=0.1 2024-08-15 05:50:37,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3038250.0, ans=0.125 2024-08-15 05:50:44,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14000, loss[loss=0.09873, beats_loss=0.009126, ecapa_loss=0.0001229, whisper_loss=0.08837, over 15618.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001494, whisper_loss=0.09143, over 3902824.56 frames. ], batch size: 58, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:50:58,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3038350.0, ans=0.0 2024-08-15 05:50:58,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3038350.0, ans=0.0 2024-08-15 05:51:08,701 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 05:51:17,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3038450.0, ans=0.1 2024-08-15 05:51:54,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3038650.0, ans=0.0 2024-08-15 05:52:06,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3038650.0, ans=0.2 2024-08-15 05:52:11,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.343e+01 2.615e+01 2.930e+01 6.184e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-15 05:52:14,244 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-15 05:52:35,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14050, loss[loss=0.1035, beats_loss=0.01178, ecapa_loss=0.0001342, whisper_loss=0.09036, over 23630.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001498, whisper_loss=0.09193, over 3892176.42 frames. ], batch size: 93, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:52:44,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3038850.0, ans=0.125 2024-08-15 05:52:49,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=22.5 2024-08-15 05:54:02,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3039250.0, ans=0.125 2024-08-15 05:54:04,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3039250.0, ans=0.0 2024-08-15 05:54:11,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3039250.0, ans=0.125 2024-08-15 05:54:15,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.71 vs. limit=6.0 2024-08-15 05:54:18,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14100, loss[loss=0.1245, beats_loss=0.009192, ecapa_loss=0.0001726, whisper_loss=0.1136, over 18625.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09227, over 3886874.41 frames. ], batch size: 76, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:54:23,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-15 05:54:44,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-15 05:54:55,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3039450.0, ans=0.125 2024-08-15 05:55:15,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-15 05:55:23,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3039650.0, ans=0.0 2024-08-15 05:55:27,904 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.350e+01 2.664e+01 3.020e+01 1.564e+02, threshold=5.328e+01, percent-clipped=1.0 2024-08-15 05:55:41,431 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14150, loss[loss=0.1158, beats_loss=0.01141, ecapa_loss=0.0001447, whisper_loss=0.103, over 23416.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.000151, whisper_loss=0.09183, over 3873293.18 frames. ], batch size: 93, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:55:43,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3039850.0, ans=0.125 2024-08-15 05:55:56,991 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-15 05:56:09,476 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 05:56:21,981 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 05:56:32,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3040150.0, ans=0.1 2024-08-15 05:56:33,278 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 05:56:41,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2024-08-15 05:56:46,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3040250.0, ans=0.1 2024-08-15 05:56:47,414 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 05:56:49,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3040250.0, ans=0.02 2024-08-15 05:56:57,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3040350.0, ans=0.95 2024-08-15 05:56:58,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14200, loss[loss=0.109, beats_loss=0.01106, ecapa_loss=0.000125, whisper_loss=0.09668, over 17583.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001501, whisper_loss=0.09106, over 3852444.23 frames. ], batch size: 68, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:57:01,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-08-15 05:57:11,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3040350.0, ans=0.125 2024-08-15 05:57:12,826 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 05:57:15,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3040450.0, ans=0.1 2024-08-15 05:57:35,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3040550.0, ans=0.125 2024-08-15 05:57:35,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3040550.0, ans=0.1 2024-08-15 05:57:46,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3040650.0, ans=0.0 2024-08-15 05:57:57,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-15 05:58:00,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.326e+01 2.592e+01 2.924e+01 6.304e+01, threshold=5.183e+01, percent-clipped=1.0 2024-08-15 05:58:02,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3040750.0, ans=0.125 2024-08-15 05:58:02,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-15 05:58:03,653 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 05:58:07,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-08-15 05:58:15,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14250, loss[loss=0.1087, beats_loss=0.009726, ecapa_loss=0.0001296, whisper_loss=0.09764, over 19211.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001493, whisper_loss=0.09153, over 3856832.41 frames. ], batch size: 73, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:58:36,829 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 05:58:45,360 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 05:58:49,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3041050.0, ans=0.2 2024-08-15 05:58:50,045 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 05:59:00,398 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.749e+00 2024-08-15 05:59:03,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3041150.0, ans=0.125 2024-08-15 05:59:06,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3041150.0, ans=0.1 2024-08-15 05:59:10,608 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 05:59:20,868 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 05:59:21,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3041250.0, ans=0.125 2024-08-15 05:59:36,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.47 vs. limit=22.5 2024-08-15 05:59:36,780 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14300, loss[loss=0.09911, beats_loss=0.01093, ecapa_loss=0.0001729, whisper_loss=0.08644, over 21914.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01063, ecapa_loss=0.0001497, whisper_loss=0.09156, over 3901629.09 frames. ], batch size: 92, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 05:59:43,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3041350.0, ans=0.125 2024-08-15 06:00:32,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3041650.0, ans=0.125 2024-08-15 06:00:36,735 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 06:00:44,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.480e+01 2.675e+01 2.988e+01 3.150e+02, threshold=5.350e+01, percent-clipped=2.0 2024-08-15 06:00:50,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3041750.0, ans=0.1 2024-08-15 06:00:54,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3041750.0, ans=0.0 2024-08-15 06:01:01,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14350, loss[loss=0.106, beats_loss=0.01132, ecapa_loss=0.0001678, whisper_loss=0.09303, over 20138.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001504, whisper_loss=0.09139, over 3896769.65 frames. ], batch size: 87, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:01:02,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3041850.0, ans=0.125 2024-08-15 06:01:04,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3041850.0, ans=0.125 2024-08-15 06:01:26,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3041950.0, ans=0.1 2024-08-15 06:01:42,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3042050.0, ans=10.0 2024-08-15 06:01:54,275 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 06:01:55,849 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:01:56,832 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 06:01:58,577 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 06:01:59,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3042150.0, ans=0.2 2024-08-15 06:02:06,564 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 06:02:11,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3042250.0, ans=0.2 2024-08-15 06:02:12,091 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 06:02:14,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-15 06:02:16,760 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-15 06:02:19,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14400, loss[loss=0.1268, beats_loss=0.00934, ecapa_loss=0.0001321, whisper_loss=0.1161, over 19868.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01061, ecapa_loss=0.0001506, whisper_loss=0.09168, over 3920829.55 frames. ], batch size: 75, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:02:48,657 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 06:03:18,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3042650.0, ans=0.125 2024-08-15 06:03:21,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=22.5 2024-08-15 06:03:22,622 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 06:03:23,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.353e+01 2.673e+01 3.020e+01 3.990e+01, threshold=5.347e+01, percent-clipped=0.0 2024-08-15 06:03:40,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 21, batch 14450, loss[loss=0.115, beats_loss=0.009099, ecapa_loss=0.0001358, whisper_loss=0.1046, over 23772.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001507, whisper_loss=0.09094, over 3936669.95 frames. ], batch size: 92, lr: 2.93e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:03:56,560 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 06:04:02,508 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 06:04:03,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3042950.0, ans=0.125 2024-08-15 06:04:13,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3043050.0, ans=0.125 2024-08-15 06:04:16,110 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 06:05:22,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 0, loss[loss=0.1038, beats_loss=0.009175, ecapa_loss=0.0001682, whisper_loss=0.09291, over 19893.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.009175, ecapa_loss=0.0001682, whisper_loss=0.09291, over 19893.00 frames. ], batch size: 81, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:05:22,772 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 06:06:01,394 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005383, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 06:06:18,378 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on SV_voxceleb1: loss=0.004241, beats_loss=0, ecapa_loss=0.0004241, whisper_loss=0, over 939242.00 frames. 2024-08-15 06:08:04,779 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 06:08:04,787 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 06:08:23,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3043270.0, ans=0.1 2024-08-15 06:08:29,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.43 vs. limit=10.0 2024-08-15 06:08:45,590 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 06:09:18,558 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 06:09:30,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3043570.0, ans=0.0 2024-08-15 06:09:47,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3043670.0, ans=0.0 2024-08-15 06:10:00,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.592e+01 2.838e+01 3.156e+01 2.932e+02, threshold=5.677e+01, percent-clipped=2.0 2024-08-15 06:10:05,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 50, loss[loss=0.1123, beats_loss=0.01025, ecapa_loss=0.0001304, whisper_loss=0.1008, over 22956.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009559, ecapa_loss=0.0001569, whisper_loss=0.09124, over 887855.60 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 1.4411518807585587e+17 2024-08-15 06:10:06,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3043770.0, ans=0.0 2024-08-15 06:10:10,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3043770.0, ans=0.125 2024-08-15 06:11:06,960 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 06:11:21,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-15 06:11:29,212 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-15 06:11:37,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3044170.0, ans=0.125 2024-08-15 06:11:40,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3044170.0, ans=0.125 2024-08-15 06:11:45,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2024-08-15 06:11:57,269 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 100, loss[loss=0.09038, beats_loss=0.01066, ecapa_loss=0.0001825, whisper_loss=0.07789, over 19948.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.00955, ecapa_loss=0.0001566, whisper_loss=0.09033, over 1511690.35 frames. ], batch size: 87, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:12:22,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3044370.0, ans=10.0 2024-08-15 06:12:41,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3044370.0, ans=0.125 2024-08-15 06:12:50,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2024-08-15 06:12:57,139 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:13:02,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-15 06:13:17,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:19,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3044570.0, ans=0.125 2024-08-15 06:13:27,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2024-08-15 06:13:34,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3044670.0, ans=0.125 2024-08-15 06:13:43,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3044670.0, ans=0.125 2024-08-15 06:13:44,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-15 06:13:45,721 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 06:13:49,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.691e+01 2.918e+01 3.263e+01 8.817e+01, threshold=5.837e+01, percent-clipped=1.0 2024-08-15 06:13:54,352 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 150, loss[loss=0.1083, beats_loss=0.01153, ecapa_loss=0.0001576, whisper_loss=0.09523, over 22577.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.00955, ecapa_loss=0.0001575, whisper_loss=0.09005, over 2001461.39 frames. ], batch size: 90, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:13:56,666 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 06:14:08,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-15 06:14:10,444 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 06:14:52,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2024-08-15 06:15:24,077 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 06:15:28,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 200, loss[loss=0.1012, beats_loss=0.01129, ecapa_loss=0.0001671, whisper_loss=0.08828, over 19962.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009904, ecapa_loss=0.000157, whisper_loss=0.08997, over 2402905.83 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:15:37,195 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 06:15:52,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3045370.0, ans=0.125 2024-08-15 06:15:57,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3045370.0, ans=0.125 2024-08-15 06:16:03,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3045470.0, ans=0.125 2024-08-15 06:16:29,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3045570.0, ans=0.0 2024-08-15 06:16:33,437 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 06:16:44,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.328e+01 2.566e+01 2.862e+01 5.342e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-15 06:16:47,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 250, loss[loss=0.07242, beats_loss=0.01131, ecapa_loss=0.0001386, whisper_loss=0.05973, over 18324.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01012, ecapa_loss=0.0001554, whisper_loss=0.08975, over 2715299.18 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:16:49,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3045770.0, ans=0.2 2024-08-15 06:16:56,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3045770.0, ans=0.125 2024-08-15 06:17:05,414 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 06:17:11,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3045870.0, ans=0.0 2024-08-15 06:17:17,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3045970.0, ans=0.0 2024-08-15 06:17:43,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3046070.0, ans=0.1 2024-08-15 06:18:05,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 300, loss[loss=0.09793, beats_loss=0.009388, ecapa_loss=0.0001458, whisper_loss=0.08709, over 20328.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01019, ecapa_loss=0.0001562, whisper_loss=0.08952, over 2931257.21 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:18:21,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3046370.0, ans=10.0 2024-08-15 06:18:24,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3046370.0, ans=0.125 2024-08-15 06:18:29,007 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 06:19:17,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3046670.0, ans=0.07 2024-08-15 06:19:19,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.288e+01 2.599e+01 2.904e+01 1.999e+02, threshold=5.198e+01, percent-clipped=4.0 2024-08-15 06:19:22,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 350, loss[loss=0.1094, beats_loss=0.01033, ecapa_loss=0.0001303, whisper_loss=0.09781, over 24057.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01022, ecapa_loss=0.0001555, whisper_loss=0.09034, over 3111000.55 frames. ], batch size: 93, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:19:24,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3046770.0, ans=0.125 2024-08-15 06:19:27,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3046770.0, ans=0.0 2024-08-15 06:19:37,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3046870.0, ans=0.125 2024-08-15 06:19:37,986 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-15 06:19:45,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3046870.0, ans=0.125 2024-08-15 06:19:49,191 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 06:19:52,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3046970.0, ans=0.2 2024-08-15 06:20:11,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-15 06:20:17,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3047070.0, ans=0.125 2024-08-15 06:20:27,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3047170.0, ans=0.0 2024-08-15 06:20:38,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3047170.0, ans=0.125 2024-08-15 06:20:40,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 400, loss[loss=0.1002, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.0884, over 19783.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01025, ecapa_loss=0.0001541, whisper_loss=0.09104, over 3264184.16 frames. ], batch size: 75, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:20:57,920 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 06:21:04,169 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 06:21:06,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3047370.0, ans=0.0 2024-08-15 06:21:27,461 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 06:21:31,854 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 06:21:46,974 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 06:21:54,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.307e+01 2.559e+01 2.888e+01 1.580e+02, threshold=5.118e+01, percent-clipped=5.0 2024-08-15 06:21:55,946 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 06:21:57,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 450, loss[loss=0.1192, beats_loss=0.009393, ecapa_loss=0.0001595, whisper_loss=0.1083, over 24030.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01029, ecapa_loss=0.000154, whisper_loss=0.09145, over 3399892.47 frames. ], batch size: 93, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:22:40,983 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:22:41,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3047970.0, ans=0.125 2024-08-15 06:22:48,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3048070.0, ans=0.125 2024-08-15 06:22:56,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3048070.0, ans=0.125 2024-08-15 06:23:02,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048170.0, ans=0.1 2024-08-15 06:23:14,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 500, loss[loss=0.07557, beats_loss=0.01016, ecapa_loss=0.0001915, whisper_loss=0.06349, over 13189.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01028, ecapa_loss=0.0001524, whisper_loss=0.09134, over 3474534.58 frames. ], batch size: 57, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:23:34,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3048370.0, ans=0.2 2024-08-15 06:23:40,136 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 06:23:43,068 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-15 06:23:48,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-15 06:23:52,724 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-15 06:23:59,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048570.0, ans=0.125 2024-08-15 06:24:17,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3048670.0, ans=0.125 2024-08-15 06:24:28,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.268e+01 2.600e+01 2.909e+01 8.676e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-15 06:24:29,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3048670.0, ans=0.125 2024-08-15 06:24:31,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 550, loss[loss=0.1188, beats_loss=0.006796, ecapa_loss=0.0001444, whisper_loss=0.1106, over 15149.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01026, ecapa_loss=0.0001519, whisper_loss=0.09106, over 3552803.35 frames. ], batch size: 54, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:24:32,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3048770.0, ans=0.0 2024-08-15 06:25:18,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.74 vs. limit=15.0 2024-08-15 06:25:26,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3049070.0, ans=0.125 2024-08-15 06:25:35,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3049170.0, ans=0.125 2024-08-15 06:25:41,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049170.0, ans=0.1 2024-08-15 06:25:41,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-15 06:25:48,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 600, loss[loss=0.09383, beats_loss=0.01057, ecapa_loss=0.0001422, whisper_loss=0.08184, over 20845.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001508, whisper_loss=0.09033, over 3640821.14 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:25:48,626 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 06:25:52,062 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 06:25:57,076 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 06:25:58,606 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 06:26:25,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3049470.0, ans=0.125 2024-08-15 06:26:41,656 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:26:48,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3049570.0, ans=0.125 2024-08-15 06:26:51,352 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 06:26:51,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3049670.0, ans=0.125 2024-08-15 06:26:55,260 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.716e-03 2024-08-15 06:27:01,172 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 06:27:01,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3049670.0, ans=0.2 2024-08-15 06:27:03,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.381e+01 2.532e+01 2.729e+01 4.299e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 06:27:07,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 650, loss[loss=0.1025, beats_loss=0.0114, ecapa_loss=0.0001712, whisper_loss=0.08943, over 22409.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001505, whisper_loss=0.09021, over 3676107.05 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:27:44,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3049970.0, ans=0.02 2024-08-15 06:27:48,455 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-15 06:28:04,602 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 06:28:06,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3050070.0, ans=0.125 2024-08-15 06:28:08,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3050170.0, ans=0.0 2024-08-15 06:28:23,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 700, loss[loss=0.09951, beats_loss=0.01085, ecapa_loss=0.0001373, whisper_loss=0.08729, over 20581.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001509, whisper_loss=0.08975, over 3698600.77 frames. ], batch size: 79, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:28:24,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-15 06:28:41,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3050370.0, ans=0.125 2024-08-15 06:28:49,828 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 06:28:59,245 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 06:29:02,389 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 06:29:25,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3050670.0, ans=0.1 2024-08-15 06:29:26,874 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 06:29:30,260 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 06:29:38,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.268e+01 2.485e+01 2.982e+01 6.162e+01, threshold=4.969e+01, percent-clipped=2.0 2024-08-15 06:29:41,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 750, loss[loss=0.108, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.096, over 23528.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001504, whisper_loss=0.08973, over 3743130.16 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:29:43,610 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 18 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 06:29:46,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3050770.0, ans=0.0 2024-08-15 06:30:20,205 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 06:30:38,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3051070.0, ans=0.125 2024-08-15 06:30:54,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3051170.0, ans=0.1 2024-08-15 06:30:57,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 800, loss[loss=0.08707, beats_loss=0.01273, ecapa_loss=0.0001312, whisper_loss=0.07303, over 15003.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001506, whisper_loss=0.08944, over 3766710.64 frames. ], batch size: 59, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:31:06,965 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 06:31:10,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3051270.0, ans=0.125 2024-08-15 06:31:37,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3051470.0, ans=0.0 2024-08-15 06:31:41,533 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-15 06:31:57,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3051570.0, ans=0.125 2024-08-15 06:32:09,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3051670.0, ans=0.0 2024-08-15 06:32:11,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.305e+01 2.508e+01 2.943e+01 4.012e+02, threshold=5.016e+01, percent-clipped=1.0 2024-08-15 06:32:14,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 850, loss[loss=0.1003, beats_loss=0.01147, ecapa_loss=0.0001505, whisper_loss=0.08728, over 21810.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0001499, whisper_loss=0.08914, over 3781636.82 frames. ], batch size: 86, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:32:25,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3051770.0, ans=0.1 2024-08-15 06:32:33,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3051870.0, ans=0.0 2024-08-15 06:32:35,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-08-15 06:32:36,314 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 06:32:44,384 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 06:32:45,797 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-15 06:33:09,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=12.0 2024-08-15 06:33:31,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3052170.0, ans=0.0 2024-08-15 06:33:33,835 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 900, loss[loss=0.1346, beats_loss=0.00687, ecapa_loss=0.0001651, whisper_loss=0.1261, over 14034.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01051, ecapa_loss=0.0001486, whisper_loss=0.08876, over 3770256.84 frames. ], batch size: 54, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:33:44,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0 2024-08-15 06:34:01,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3052370.0, ans=0.07 2024-08-15 06:34:31,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-15 06:34:36,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3052670.0, ans=0.125 2024-08-15 06:34:39,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3052670.0, ans=0.1 2024-08-15 06:34:47,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.568e+01 3.064e+01 1.106e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-15 06:34:50,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 950, loss[loss=0.1179, beats_loss=0.009687, ecapa_loss=0.0001291, whisper_loss=0.1069, over 19389.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.08911, over 3780986.65 frames. ], batch size: 72, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:34:51,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3052770.0, ans=0.125 2024-08-15 06:34:59,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3052770.0, ans=0.125 2024-08-15 06:35:05,106 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 06:35:38,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3053070.0, ans=0.2 2024-08-15 06:35:54,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-08-15 06:35:59,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3053170.0, ans=0.125 2024-08-15 06:36:08,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1000, loss[loss=0.08776, beats_loss=0.01143, ecapa_loss=0.0001466, whisper_loss=0.07486, over 16536.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001483, whisper_loss=0.08921, over 3818352.33 frames. ], batch size: 66, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:36:11,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-15 06:36:17,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-15 06:36:20,078 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 06:36:24,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3053370.0, ans=0.07 2024-08-15 06:36:24,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3053370.0, ans=0.125 2024-08-15 06:36:38,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3053370.0, ans=0.125 2024-08-15 06:36:48,124 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 06:36:58,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3053570.0, ans=0.125 2024-08-15 06:37:08,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3053570.0, ans=0.0 2024-08-15 06:37:23,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.275e+01 2.548e+01 2.900e+01 4.496e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-15 06:37:26,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1050, loss[loss=0.0858, beats_loss=0.007146, ecapa_loss=0.0001572, whisper_loss=0.07708, over 17325.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001474, whisper_loss=0.08923, over 3827950.25 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:37:28,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-15 06:37:31,054 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 06:37:37,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3053770.0, ans=0.125 2024-08-15 06:37:41,383 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08964061737060547, model_norm_threshold=50.96524429321289 2024-08-15 06:37:41,557 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.115e+05, grad_sumsq=1.116e+07, orig_rms_sq=9.994e-03 2024-08-15 06:37:42,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3053870.0, ans=0.2 2024-08-15 06:37:43,352 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 06:38:02,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3053970.0, ans=0.125 2024-08-15 06:38:27,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3054170.0, ans=0.0 2024-08-15 06:38:30,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3054170.0, ans=0.2 2024-08-15 06:38:33,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3054170.0, ans=0.125 2024-08-15 06:38:37,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3054170.0, ans=0.125 2024-08-15 06:38:43,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1100, loss[loss=0.06554, beats_loss=0.015, ecapa_loss=8.392e-05, whisper_loss=0.04969, over 16654.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001469, whisper_loss=0.08912, over 3814499.17 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:38:44,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3054270.0, ans=0.125 2024-08-15 06:38:51,998 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 06:39:02,045 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 06:39:14,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3054470.0, ans=0.125 2024-08-15 06:40:04,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054570.0, ans=0.1 2024-08-15 06:40:12,036 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 06:40:28,954 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 06:40:32,061 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.402e+01 2.652e+01 3.039e+01 5.686e+02, threshold=5.304e+01, percent-clipped=1.0 2024-08-15 06:40:34,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1150, loss[loss=0.09585, beats_loss=0.01231, ecapa_loss=0.0001229, whisper_loss=0.08231, over 15015.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001474, whisper_loss=0.08975, over 3820147.77 frames. ], batch size: 58, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:41:06,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3054970.0, ans=0.125 2024-08-15 06:41:18,302 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 06:41:32,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3055070.0, ans=0.125 2024-08-15 06:41:36,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3055070.0, ans=0.0 2024-08-15 06:42:03,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1200, loss[loss=0.09044, beats_loss=0.01125, ecapa_loss=0.0001466, whisper_loss=0.07773, over 21850.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.08942, over 3796503.17 frames. ], batch size: 90, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:42:41,048 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 06:43:06,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3055470.0, ans=0.125 2024-08-15 06:43:21,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3055570.0, ans=0.125 2024-08-15 06:43:35,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3055670.0, ans=0.125 2024-08-15 06:43:42,654 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 06:43:43,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.260e+01 2.457e+01 2.910e+01 3.777e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-15 06:43:48,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1250, loss[loss=0.09386, beats_loss=0.01313, ecapa_loss=0.0001639, whisper_loss=0.07909, over 20465.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01062, ecapa_loss=0.0001462, whisper_loss=0.08862, over 3794648.44 frames. ], batch size: 82, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:44:01,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2024-08-15 06:44:32,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3055870.0, ans=0.125 2024-08-15 06:44:35,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3055970.0, ans=0.125 2024-08-15 06:44:41,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3055970.0, ans=0.09899494936611666 2024-08-15 06:44:55,428 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 06:44:59,748 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 06:45:02,939 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 06:45:49,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2024-08-15 06:45:53,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1300, loss[loss=0.08461, beats_loss=0.01169, ecapa_loss=0.0001302, whisper_loss=0.07161, over 19692.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01066, ecapa_loss=0.0001461, whisper_loss=0.08838, over 3801121.82 frames. ], batch size: 76, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:45:59,853 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 06:46:27,153 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 06:46:48,159 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 06:46:57,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3056470.0, ans=0.1 2024-08-15 06:47:08,290 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-15 06:47:44,910 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 06:47:50,179 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 06:47:51,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.277e+01 2.465e+01 2.867e+01 3.912e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-15 06:47:55,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1350, loss[loss=0.08451, beats_loss=0.01378, ecapa_loss=0.0001193, whisper_loss=0.06953, over 19971.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001459, whisper_loss=0.0897, over 3825629.12 frames. ], batch size: 78, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:48:06,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3056770.0, ans=0.0 2024-08-15 06:48:10,762 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 06:49:07,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3057070.0, ans=0.1 2024-08-15 06:49:10,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057070.0, ans=0.1 2024-08-15 06:49:23,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3057070.0, ans=0.125 2024-08-15 06:49:33,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3057170.0, ans=0.09899494936611666 2024-08-15 06:49:43,874 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 06:49:48,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1400, loss[loss=0.1173, beats_loss=0.006953, ecapa_loss=0.0001661, whisper_loss=0.1086, over 22291.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.08973, over 3828941.48 frames. ], batch size: 88, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:50:05,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-08-15 06:50:12,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3057370.0, ans=0.0 2024-08-15 06:50:17,258 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 06:50:26,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3057470.0, ans=0.0 2024-08-15 06:50:29,233 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 06:50:29,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3057470.0, ans=0.5 2024-08-15 06:50:32,965 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 06:50:40,610 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 06:50:45,879 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 06:51:13,445 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.207e+01 2.496e+01 2.856e+01 4.886e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 06:51:56,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1450, loss[loss=0.1152, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.1034, over 24155.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001452, whisper_loss=0.08945, over 3815873.22 frames. ], batch size: 92, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:52:00,281 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 06:52:00,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3057770.0, ans=0.125 2024-08-15 06:52:23,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3057870.0, ans=0.125 2024-08-15 06:52:49,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3058070.0, ans=0.0 2024-08-15 06:53:16,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3058170.0, ans=0.0 2024-08-15 06:53:24,294 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 06:53:28,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3058270.0, ans=0.0 2024-08-15 06:53:28,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1500, loss[loss=0.109, beats_loss=0.009784, ecapa_loss=0.0001676, whisper_loss=0.09749, over 22009.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001456, whisper_loss=0.08968, over 3801103.97 frames. ], batch size: 90, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:53:42,795 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 35 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 06:53:50,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-08-15 06:53:58,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058370.0, ans=0.1 2024-08-15 06:54:14,384 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:54:23,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3058470.0, ans=0.125 2024-08-15 06:54:27,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3058570.0, ans=0.0 2024-08-15 06:54:28,618 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 06:54:41,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3058570.0, ans=0.2 2024-08-15 06:54:42,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3058570.0, ans=15.0 2024-08-15 06:54:43,615 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:54:52,730 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 06:54:58,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.187e+01 2.410e+01 2.690e+01 4.725e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-15 06:55:02,241 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1550, loss[loss=0.08484, beats_loss=0.01207, ecapa_loss=0.0001474, whisper_loss=0.0713, over 20154.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001455, whisper_loss=0.0893, over 3805395.32 frames. ], batch size: 82, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:55:05,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-15 06:55:14,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-15 06:55:16,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3058770.0, ans=0.125 2024-08-15 06:55:20,052 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-15 06:55:48,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3058970.0, ans=0.0 2024-08-15 06:55:52,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-08-15 06:55:54,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3058970.0, ans=0.125 2024-08-15 06:56:13,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3059170.0, ans=0.0 2024-08-15 06:56:22,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3059170.0, ans=0.0 2024-08-15 06:56:28,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3059170.0, ans=0.125 2024-08-15 06:56:32,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1600, loss[loss=0.07792, beats_loss=0.01011, ecapa_loss=0.0001482, whisper_loss=0.06633, over 15941.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001452, whisper_loss=0.08959, over 3806019.59 frames. ], batch size: 62, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:56:33,224 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 06:56:38,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3059270.0, ans=0.09899494936611666 2024-08-15 06:56:41,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.45 vs. limit=22.5 2024-08-15 06:56:42,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3059270.0, ans=0.0 2024-08-15 06:56:42,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3059270.0, ans=0.2 2024-08-15 06:56:48,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3059270.0, ans=0.125 2024-08-15 06:56:58,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3059370.0, ans=0.125 2024-08-15 06:57:00,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3059370.0, ans=0.125 2024-08-15 06:57:00,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3059370.0, ans=0.2 2024-08-15 06:57:14,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3059470.0, ans=0.125 2024-08-15 06:57:21,026 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 06:57:36,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3059570.0, ans=0.0 2024-08-15 06:57:51,638 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-15 06:57:58,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.72 vs. limit=22.5 2024-08-15 06:57:59,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.263e+01 2.473e+01 2.776e+01 4.144e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-15 06:58:02,337 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1650, loss[loss=0.1207, beats_loss=0.008303, ecapa_loss=0.0001493, whisper_loss=0.1109, over 19604.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001464, whisper_loss=0.09019, over 3818037.92 frames. ], batch size: 73, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:58:04,487 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-15 06:58:08,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059770.0, ans=0.0 2024-08-15 06:58:40,017 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 06:58:47,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3059970.0, ans=0.1 2024-08-15 06:58:51,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3059970.0, ans=0.125 2024-08-15 06:58:53,560 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 06:58:57,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-15 06:59:30,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1700, loss[loss=0.1007, beats_loss=0.01175, ecapa_loss=0.0001309, whisper_loss=0.08759, over 20908.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001461, whisper_loss=0.09096, over 3828752.43 frames. ], batch size: 83, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 06:59:35,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:37,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3060270.0, ans=0.125 2024-08-15 06:59:55,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-08-15 06:59:58,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2024-08-15 07:00:25,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3060570.0, ans=0.125 2024-08-15 07:00:51,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.353e+01 2.609e+01 2.862e+01 3.979e+01, threshold=5.218e+01, percent-clipped=0.0 2024-08-15 07:00:54,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1750, loss[loss=0.112, beats_loss=0.009571, ecapa_loss=0.0001569, whisper_loss=0.1009, over 16533.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001455, whisper_loss=0.09088, over 3854913.77 frames. ], batch size: 65, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:01:08,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3060770.0, ans=0.125 2024-08-15 07:01:10,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-15 07:01:15,709 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-15 07:01:18,602 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 07:01:21,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3060870.0, ans=0.125 2024-08-15 07:01:47,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3061070.0, ans=0.125 2024-08-15 07:02:04,567 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 07:02:15,279 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1800, loss[loss=0.1064, beats_loss=0.01096, ecapa_loss=0.000113, whisper_loss=0.09435, over 22549.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001454, whisper_loss=0.09074, over 3808134.60 frames. ], batch size: 85, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:02:22,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3061270.0, ans=0.125 2024-08-15 07:02:24,089 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 07:02:45,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061370.0, ans=0.1 2024-08-15 07:02:54,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3061470.0, ans=0.95 2024-08-15 07:02:56,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061470.0, ans=0.1 2024-08-15 07:03:13,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3061570.0, ans=0.125 2024-08-15 07:03:15,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-15 07:03:20,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3061670.0, ans=0.0 2024-08-15 07:03:31,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.281e+01 2.524e+01 2.715e+01 4.496e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-15 07:03:35,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1850, loss[loss=0.1068, beats_loss=0.01213, ecapa_loss=0.0001336, whisper_loss=0.09333, over 17050.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001465, whisper_loss=0.0902, over 3768065.13 frames. ], batch size: 67, lr: 2.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:03:43,282 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 07:03:44,724 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-15 07:03:52,829 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 07:03:59,683 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 07:04:17,898 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 07:04:40,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3062170.0, ans=0.125 2024-08-15 07:04:42,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3062170.0, ans=0.0 2024-08-15 07:04:43,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3062170.0, ans=0.0 2024-08-15 07:04:54,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.85 vs. limit=22.5 2024-08-15 07:04:55,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3062270.0, ans=0.125 2024-08-15 07:04:55,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1900, loss[loss=0.08238, beats_loss=0.0114, ecapa_loss=0.0001755, whisper_loss=0.06923, over 14899.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.08989, over 3793073.77 frames. ], batch size: 65, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:04:56,389 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 07:05:03,277 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 07:05:11,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062370.0, ans=0.1 2024-08-15 07:05:30,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2024-08-15 07:05:43,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3062570.0, ans=0.04949747468305833 2024-08-15 07:05:47,885 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 07:06:12,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.324e+01 2.578e+01 2.950e+01 1.570e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-15 07:06:15,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 1950, loss[loss=0.1096, beats_loss=0.01075, ecapa_loss=0.0001439, whisper_loss=0.09738, over 21894.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01056, ecapa_loss=0.0001462, whisper_loss=0.0892, over 3773946.26 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:06:17,635 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 07:06:21,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3062770.0, ans=0.125 2024-08-15 07:06:25,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3062770.0, ans=0.125 2024-08-15 07:06:27,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3062770.0, ans=0.07 2024-08-15 07:06:28,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3062770.0, ans=0.0 2024-08-15 07:06:38,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-15 07:06:40,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2024-08-15 07:06:52,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-15 07:07:05,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3063070.0, ans=0.125 2024-08-15 07:07:35,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2000, loss[loss=0.09507, beats_loss=0.0105, ecapa_loss=0.0001616, whisper_loss=0.08295, over 19489.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001461, whisper_loss=0.08927, over 3765122.03 frames. ], batch size: 79, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:07:51,618 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 07:08:06,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3063470.0, ans=0.125 2024-08-15 07:08:18,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3063470.0, ans=0.0 2024-08-15 07:08:28,054 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-15 07:08:48,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3063670.0, ans=0.0 2024-08-15 07:08:51,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.304e+01 2.556e+01 2.865e+01 6.565e+01, threshold=5.113e+01, percent-clipped=1.0 2024-08-15 07:08:53,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3063770.0, ans=0.0 2024-08-15 07:08:54,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2050, loss[loss=0.09938, beats_loss=0.01155, ecapa_loss=0.0001257, whisper_loss=0.08657, over 23809.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001464, whisper_loss=0.08929, over 3771196.52 frames. ], batch size: 95, lr: 2.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 07:09:08,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3063770.0, ans=0.0 2024-08-15 07:09:14,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.27 vs. limit=22.5 2024-08-15 07:09:33,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3063970.0, ans=0.2 2024-08-15 07:09:38,045 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-15 07:10:05,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3064170.0, ans=0.125 2024-08-15 07:10:12,796 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2100, loss[loss=0.1109, beats_loss=0.009119, ecapa_loss=0.0001708, whisper_loss=0.1001, over 16139.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001461, whisper_loss=0.08927, over 3778963.86 frames. ], batch size: 61, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:10:27,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-15 07:10:37,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3064370.0, ans=0.125 2024-08-15 07:10:59,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3064570.0, ans=0.125 2024-08-15 07:11:28,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.342e+01 2.621e+01 2.964e+01 3.863e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-15 07:11:32,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2150, loss[loss=0.09443, beats_loss=0.01312, ecapa_loss=0.0001161, whisper_loss=0.08015, over 15680.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001449, whisper_loss=0.08907, over 3824158.21 frames. ], batch size: 62, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:11:40,345 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 07:11:43,540 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 07:11:48,474 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 07:11:55,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3064870.0, ans=0.125 2024-08-15 07:11:55,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-15 07:12:08,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3064970.0, ans=0.0 2024-08-15 07:12:08,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3064970.0, ans=0.125 2024-08-15 07:12:21,736 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-15 07:12:33,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3065070.0, ans=0.125 2024-08-15 07:12:39,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065170.0, ans=0.125 2024-08-15 07:12:44,084 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 07:12:50,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3065170.0, ans=0.0 2024-08-15 07:12:53,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3065270.0, ans=0.125 2024-08-15 07:12:53,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2200, loss[loss=0.09632, beats_loss=0.01064, ecapa_loss=0.0001611, whisper_loss=0.08406, over 17290.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001461, whisper_loss=0.08955, over 3799745.62 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:12:54,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3065270.0, ans=0.05 2024-08-15 07:12:58,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-08-15 07:13:13,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.59 vs. limit=22.5 2024-08-15 07:13:18,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-08-15 07:13:32,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3065470.0, ans=0.1 2024-08-15 07:13:34,213 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 07:13:43,141 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 07:13:49,477 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 07:14:04,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-15 07:14:09,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.287e+01 2.531e+01 2.773e+01 4.088e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 07:14:11,796 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 07:14:12,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2250, loss[loss=0.1047, beats_loss=0.007952, ecapa_loss=0.0001836, whisper_loss=0.09492, over 14893.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001461, whisper_loss=0.09018, over 3809185.24 frames. ], batch size: 59, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:14:13,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3065770.0, ans=0.04949747468305833 2024-08-15 07:14:15,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2024-08-15 07:14:35,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3065870.0, ans=0.125 2024-08-15 07:14:42,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.58 vs. limit=22.5 2024-08-15 07:14:50,739 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 07:14:56,705 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 07:15:01,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3066070.0, ans=0.2 2024-08-15 07:15:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3066070.0, ans=0.1 2024-08-15 07:15:22,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3066170.0, ans=0.2 2024-08-15 07:15:25,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-08-15 07:15:33,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3066270.0, ans=0.125 2024-08-15 07:15:34,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2300, loss[loss=0.08461, beats_loss=0.009513, ecapa_loss=0.0001501, whisper_loss=0.07359, over 14480.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001461, whisper_loss=0.09041, over 3865547.73 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:16:09,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3066470.0, ans=0.125 2024-08-15 07:16:15,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-08-15 07:16:17,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3066470.0, ans=0.2 2024-08-15 07:16:21,149 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.724e-02 2024-08-15 07:16:35,464 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-15 07:16:50,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.352e+01 2.574e+01 2.900e+01 4.946e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-15 07:16:53,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2350, loss[loss=0.1192, beats_loss=0.008727, ecapa_loss=0.0001342, whisper_loss=0.1092, over 22732.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09091, over 3849910.05 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:16:57,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3066770.0, ans=0.1 2024-08-15 07:17:12,181 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 07:17:18,576 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 07:17:24,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3066870.0, ans=0.125 2024-08-15 07:17:38,309 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 07:17:43,757 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 13 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 07:17:44,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3067070.0, ans=0.125 2024-08-15 07:17:57,538 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 07:17:57,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3067170.0, ans=0.0 2024-08-15 07:18:12,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2400, loss[loss=0.09846, beats_loss=0.01361, ecapa_loss=0.0001085, whisper_loss=0.08376, over 20729.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001479, whisper_loss=0.09107, over 3833668.57 frames. ], batch size: 81, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:18:16,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3067270.0, ans=0.2 2024-08-15 07:18:17,204 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 07:18:18,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3067270.0, ans=0.0 2024-08-15 07:18:22,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3067270.0, ans=0.125 2024-08-15 07:18:33,977 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-15 07:19:00,211 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 07:19:26,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.236e+01 2.451e+01 2.781e+01 1.373e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-15 07:19:27,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-15 07:19:29,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2450, loss[loss=0.1141, beats_loss=0.0099, ecapa_loss=0.0001574, whisper_loss=0.1027, over 16255.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.09077, over 3803145.21 frames. ], batch size: 64, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:19:32,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3067770.0, ans=0.0 2024-08-15 07:19:42,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-08-15 07:20:43,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068170.0, ans=0.1 2024-08-15 07:20:48,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2500, loss[loss=0.08126, beats_loss=0.01135, ecapa_loss=0.0001685, whisper_loss=0.06822, over 19311.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.000146, whisper_loss=0.0904, over 3825133.98 frames. ], batch size: 83, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:20:51,143 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 07:21:02,109 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 07:21:02,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-15 07:21:04,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-08-15 07:21:17,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3068370.0, ans=0.2 2024-08-15 07:21:35,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3068570.0, ans=0.0 2024-08-15 07:22:02,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3068670.0, ans=0.0 2024-08-15 07:22:04,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.249e+01 2.498e+01 2.918e+01 4.518e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 07:22:05,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3068670.0, ans=0.0 2024-08-15 07:22:07,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2550, loss[loss=0.1135, beats_loss=0.008257, ecapa_loss=0.0001707, whisper_loss=0.1035, over 16081.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.09052, over 3837463.53 frames. ], batch size: 64, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:22:09,234 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 07:22:12,439 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 07:22:24,867 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 07:22:43,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3068970.0, ans=0.125 2024-08-15 07:23:07,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-15 07:23:25,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2600, loss[loss=0.122, beats_loss=0.009625, ecapa_loss=0.0001675, whisper_loss=0.1107, over 23034.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001477, whisper_loss=0.09084, over 3853833.27 frames. ], batch size: 93, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:23:54,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3069370.0, ans=0.125 2024-08-15 07:23:56,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3069470.0, ans=0.07 2024-08-15 07:23:57,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3069470.0, ans=0.125 2024-08-15 07:24:08,539 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 07:24:10,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3069470.0, ans=0.0 2024-08-15 07:24:12,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-15 07:24:40,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.364e+01 2.551e+01 2.920e+01 2.244e+02, threshold=5.103e+01, percent-clipped=2.0 2024-08-15 07:24:43,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2650, loss[loss=0.1219, beats_loss=0.00908, ecapa_loss=0.0001495, whisper_loss=0.1113, over 23350.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001486, whisper_loss=0.09063, over 3851235.98 frames. ], batch size: 93, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:24:45,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3069770.0, ans=0.125 2024-08-15 07:25:20,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3069970.0, ans=0.0 2024-08-15 07:25:34,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3070070.0, ans=0.125 2024-08-15 07:25:37,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:25:45,514 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 07:25:52,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3070170.0, ans=0.0 2024-08-15 07:25:56,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3070170.0, ans=0.0 2024-08-15 07:26:02,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2700, loss[loss=0.09118, beats_loss=0.009349, ecapa_loss=0.0001516, whisper_loss=0.08031, over 16605.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001487, whisper_loss=0.09066, over 3861072.49 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:26:03,858 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 07:26:47,078 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 07:26:49,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3070570.0, ans=0.125 2024-08-15 07:26:53,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3070570.0, ans=0.0 2024-08-15 07:26:57,675 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 07:27:00,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.47 vs. limit=22.5 2024-08-15 07:27:12,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3070670.0, ans=0.0 2024-08-15 07:27:17,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3070670.0, ans=0.125 2024-08-15 07:27:17,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.236e+01 2.522e+01 2.732e+01 2.329e+02, threshold=5.045e+01, percent-clipped=1.0 2024-08-15 07:27:21,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2750, loss[loss=0.1142, beats_loss=0.009414, ecapa_loss=0.0001716, whisper_loss=0.103, over 19829.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001496, whisper_loss=0.09094, over 3870525.92 frames. ], batch size: 80, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:27:28,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-15 07:27:30,719 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 11 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 07:27:39,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3070870.0, ans=0.2 2024-08-15 07:27:42,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3070870.0, ans=0.1 2024-08-15 07:28:09,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3071070.0, ans=0.125 2024-08-15 07:28:13,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3071070.0, ans=0.0 2024-08-15 07:28:37,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3071170.0, ans=0.125 2024-08-15 07:28:39,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3071270.0, ans=0.125 2024-08-15 07:28:39,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2800, loss[loss=0.1109, beats_loss=0.01065, ecapa_loss=0.0001409, whisper_loss=0.09885, over 22185.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.09098, over 3856119.92 frames. ], batch size: 88, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:28:47,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071270.0, ans=0.1 2024-08-15 07:29:00,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3071370.0, ans=0.125 2024-08-15 07:29:13,864 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 07:29:16,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3071470.0, ans=0.1 2024-08-15 07:29:19,796 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 07:29:22,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3071470.0, ans=0.2 2024-08-15 07:29:31,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3071570.0, ans=0.2 2024-08-15 07:29:33,560 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 07:29:39,073 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.232e-02 2024-08-15 07:29:44,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-15 07:29:47,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3071670.0, ans=0.2 2024-08-15 07:29:55,946 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 07:29:57,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.358e+01 2.558e+01 2.902e+01 4.968e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-15 07:30:00,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2850, loss[loss=0.07221, beats_loss=0.01162, ecapa_loss=0.0001292, whisper_loss=0.0593, over 17601.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001494, whisper_loss=0.09021, over 3846161.97 frames. ], batch size: 71, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:30:15,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3071870.0, ans=0.2 2024-08-15 07:30:28,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3071870.0, ans=22.5 2024-08-15 07:30:37,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3071970.0, ans=0.125 2024-08-15 07:30:42,230 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 07:30:43,464 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 07:30:50,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-15 07:30:51,832 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 07:31:10,780 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 29 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-15 07:31:18,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2900, loss[loss=0.1184, beats_loss=0.009396, ecapa_loss=0.0001311, whisper_loss=0.1077, over 15460.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001508, whisper_loss=0.09081, over 3861171.96 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:31:25,168 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 07:31:34,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3072370.0, ans=0.125 2024-08-15 07:31:46,666 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 07:31:57,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3072470.0, ans=0.0 2024-08-15 07:31:57,928 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 07:32:05,979 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 07:32:15,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3072570.0, ans=0.125 2024-08-15 07:32:24,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072670.0, ans=0.125 2024-08-15 07:32:26,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-15 07:32:31,417 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 07:32:32,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.406e+01 2.599e+01 2.781e+01 4.544e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-15 07:32:32,972 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 07:32:34,545 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 07:32:35,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 2950, loss[loss=0.08734, beats_loss=0.008312, ecapa_loss=0.0001377, whisper_loss=0.07765, over 16903.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.09031, over 3867389.36 frames. ], batch size: 64, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:32:50,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3072870.0, ans=0.0 2024-08-15 07:32:53,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3072870.0, ans=0.125 2024-08-15 07:32:53,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3072870.0, ans=0.0 2024-08-15 07:33:01,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072870.0, ans=0.125 2024-08-15 07:33:07,858 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 07:33:32,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-08-15 07:33:42,458 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 07:33:43,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2024-08-15 07:33:48,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3000, loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001479, whisper_loss=0.08857, over 19439.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.09135, over 3904064.80 frames. ], batch size: 78, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:33:48,962 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 07:34:30,946 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005255, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 07:34:46,524 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on SV_voxceleb1: loss=0.004113, beats_loss=0, ecapa_loss=0.0004113, whisper_loss=0, over 939242.00 frames. 2024-08-15 07:36:48,663 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 07:36:48,667 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 07:37:29,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3073470.0, ans=0.125 2024-08-15 07:37:41,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3073570.0, ans=0.0 2024-08-15 07:37:45,734 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 07:37:51,114 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 07:37:59,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.341e+01 2.586e+01 2.957e+01 4.096e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-15 07:38:00,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3073670.0, ans=0.0 2024-08-15 07:38:02,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3050, loss[loss=0.118, beats_loss=0.008955, ecapa_loss=0.0001729, whisper_loss=0.1073, over 21973.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01045, ecapa_loss=0.0001513, whisper_loss=0.09229, over 3910785.14 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:38:08,915 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-15 07:38:30,555 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 07:38:34,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3073970.0, ans=0.2 2024-08-15 07:38:36,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-15 07:38:57,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074070.0, ans=0.125 2024-08-15 07:39:02,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074170.0, ans=0.1 2024-08-15 07:39:07,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3074170.0, ans=0.125 2024-08-15 07:39:12,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-08-15 07:39:15,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3100, loss[loss=0.1409, beats_loss=0.008162, ecapa_loss=0.0001743, whisper_loss=0.131, over 21487.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.0922, over 3914223.18 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:39:37,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074370.0, ans=0.125 2024-08-15 07:39:47,855 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-15 07:39:54,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-15 07:39:55,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3074470.0, ans=0.125 2024-08-15 07:39:59,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3074570.0, ans=0.2 2024-08-15 07:40:17,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-15 07:40:20,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074670.0, ans=0.1 2024-08-15 07:40:20,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-15 07:40:22,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074670.0, ans=0.125 2024-08-15 07:40:23,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.264e+01 2.554e+01 2.842e+01 4.812e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-15 07:40:26,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3150, loss[loss=0.112, beats_loss=0.00963, ecapa_loss=0.0001394, whisper_loss=0.1009, over 20220.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.09141, over 3897541.31 frames. ], batch size: 79, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:40:29,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3074770.0, ans=0.125 2024-08-15 07:40:45,858 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 07:41:10,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3075070.0, ans=0.125 2024-08-15 07:41:18,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3075070.0, ans=0.025 2024-08-15 07:41:28,301 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 07:41:37,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3075170.0, ans=0.0 2024-08-15 07:41:39,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3200, loss[loss=0.08653, beats_loss=0.01228, ecapa_loss=0.0001505, whisper_loss=0.07274, over 15459.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.000152, whisper_loss=0.09132, over 3899108.84 frames. ], batch size: 66, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:41:44,544 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 07:42:11,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3075470.0, ans=0.0 2024-08-15 07:42:25,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3075570.0, ans=0.015 2024-08-15 07:42:36,498 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 07:42:48,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.323e+01 2.639e+01 2.854e+01 4.930e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 07:42:51,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3250, loss[loss=0.1121, beats_loss=0.009643, ecapa_loss=0.0001228, whisper_loss=0.1013, over 23962.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001524, whisper_loss=0.09156, over 3893415.68 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:42:55,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3075770.0, ans=0.125 2024-08-15 07:42:56,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3075770.0, ans=0.125 2024-08-15 07:43:11,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3075870.0, ans=0.0 2024-08-15 07:43:18,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3075970.0, ans=0.125 2024-08-15 07:43:35,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3076070.0, ans=0.125 2024-08-15 07:43:40,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-15 07:43:51,693 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 07:44:00,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.96 vs. limit=22.5 2024-08-15 07:44:01,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3300, loss[loss=0.087, beats_loss=0.01219, ecapa_loss=0.0001102, whisper_loss=0.0737, over 15319.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001525, whisper_loss=0.09167, over 3883511.11 frames. ], batch size: 58, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:44:15,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-15 07:44:27,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3076370.0, ans=0.125 2024-08-15 07:44:33,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3076470.0, ans=0.0 2024-08-15 07:44:35,957 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 07:44:44,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3076570.0, ans=0.125 2024-08-15 07:45:11,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.361e+01 2.617e+01 2.908e+01 9.847e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 07:45:11,494 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 07:45:13,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3350, loss[loss=0.0826, beats_loss=0.01088, ecapa_loss=0.0001794, whisper_loss=0.06992, over 14590.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001514, whisper_loss=0.09113, over 3866142.65 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:45:15,789 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 07:45:30,018 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 07:45:34,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3076870.0, ans=0.125 2024-08-15 07:45:43,893 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 07:45:46,511 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 07:46:01,147 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 07:46:07,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3077070.0, ans=0.04949747468305833 2024-08-15 07:46:16,788 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-15 07:46:19,601 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-15 07:46:19,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3077170.0, ans=0.125 2024-08-15 07:46:24,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3400, loss[loss=0.1062, beats_loss=0.01006, ecapa_loss=0.0001616, whisper_loss=0.0945, over 21767.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001502, whisper_loss=0.09188, over 3906518.74 frames. ], batch size: 91, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:46:30,122 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 07:46:39,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-15 07:47:01,832 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 07:47:30,010 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 07:47:32,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.325e+01 2.575e+01 2.904e+01 4.420e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 07:47:35,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3450, loss[loss=0.09816, beats_loss=0.01122, ecapa_loss=0.0001793, whisper_loss=0.08515, over 15747.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001511, whisper_loss=0.09106, over 3892564.31 frames. ], batch size: 67, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:48:15,902 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 07:48:20,403 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-15 07:48:29,063 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 07:48:31,670 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 07:48:37,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3078170.0, ans=0.2 2024-08-15 07:48:46,727 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-15 07:48:46,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3078170.0, ans=0.125 2024-08-15 07:48:48,044 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-15 07:48:49,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3500, loss[loss=0.1176, beats_loss=0.007948, ecapa_loss=0.0002076, whisper_loss=0.1076, over 14152.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001508, whisper_loss=0.09089, over 3902363.45 frames. ], batch size: 57, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:49:09,510 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 07:49:25,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3078470.0, ans=0.125 2024-08-15 07:49:35,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-08-15 07:49:52,197 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 07:49:57,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.337e+01 2.595e+01 2.865e+01 3.542e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-15 07:49:57,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3078670.0, ans=0.125 2024-08-15 07:49:59,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3550, loss[loss=0.1171, beats_loss=0.009266, ecapa_loss=0.0001528, whisper_loss=0.1063, over 19740.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.09072, over 3880482.56 frames. ], batch size: 77, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:50:03,114 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 07:50:04,642 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 07:50:05,926 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 07:50:10,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3078770.0, ans=0.2 2024-08-15 07:50:29,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-15 07:50:46,297 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 07:50:46,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3079070.0, ans=0.2 2024-08-15 07:50:59,102 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-15 07:51:12,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3600, loss[loss=0.1202, beats_loss=0.008782, ecapa_loss=0.0001719, whisper_loss=0.1097, over 16688.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.0911, over 3852119.99 frames. ], batch size: 67, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:51:12,842 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 07:51:14,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3079270.0, ans=0.0 2024-08-15 07:51:22,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2024-08-15 07:51:27,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3079370.0, ans=0.2 2024-08-15 07:51:28,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3079370.0, ans=0.125 2024-08-15 07:51:35,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3079370.0, ans=0.1 2024-08-15 07:51:37,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-15 07:51:44,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3079470.0, ans=0.07 2024-08-15 07:51:50,115 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 07:52:13,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3079670.0, ans=0.0 2024-08-15 07:52:16,177 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 07:52:21,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.253e+01 2.466e+01 2.769e+01 4.270e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-15 07:52:24,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3650, loss[loss=0.0946, beats_loss=0.01158, ecapa_loss=0.0001551, whisper_loss=0.08147, over 16930.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001517, whisper_loss=0.09067, over 3823205.82 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:52:26,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3079770.0, ans=0.0 2024-08-15 07:52:34,235 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 07:52:36,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3079770.0, ans=0.2 2024-08-15 07:52:43,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3079870.0, ans=0.0 2024-08-15 07:52:53,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3079970.0, ans=0.125 2024-08-15 07:53:02,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3079970.0, ans=0.2 2024-08-15 07:53:08,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-15 07:53:11,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-15 07:53:26,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-15 07:53:27,352 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 07:53:31,542 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-15 07:53:38,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3700, loss[loss=0.09782, beats_loss=0.01087, ecapa_loss=0.0001407, whisper_loss=0.08554, over 13959.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001523, whisper_loss=0.09025, over 3822224.58 frames. ], batch size: 54, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:53:39,820 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-15 07:53:47,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3080270.0, ans=0.125 2024-08-15 07:53:55,303 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 07:54:03,112 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 07:54:29,597 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 07:54:32,664 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-15 07:54:38,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3080670.0, ans=0.125 2024-08-15 07:54:40,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3080670.0, ans=0.125 2024-08-15 07:54:45,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.379e+01 2.603e+01 2.924e+01 1.234e+02, threshold=5.207e+01, percent-clipped=1.0 2024-08-15 07:54:46,770 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 07:54:47,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3080770.0, ans=0.125 2024-08-15 07:54:47,824 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3750, loss[loss=0.1119, beats_loss=0.01097, ecapa_loss=0.0001424, whisper_loss=0.09952, over 18848.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001523, whisper_loss=0.09026, over 3836977.94 frames. ], batch size: 76, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:55:04,473 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 07:55:12,711 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-15 07:55:17,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-15 07:55:32,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3081070.0, ans=0.125 2024-08-15 07:55:33,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3081070.0, ans=0.125 2024-08-15 07:55:35,103 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 07:55:43,482 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 07:55:56,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3800, loss[loss=0.08923, beats_loss=0.01237, ecapa_loss=0.0001367, whisper_loss=0.07549, over 17497.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001523, whisper_loss=0.09029, over 3865351.25 frames. ], batch size: 70, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:56:09,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3081370.0, ans=0.125 2024-08-15 07:56:23,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3081470.0, ans=0.125 2024-08-15 07:56:34,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3081470.0, ans=0.0 2024-08-15 07:56:43,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.89 vs. limit=10.0 2024-08-15 07:56:55,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-15 07:57:02,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.325e+01 2.607e+01 2.959e+01 1.127e+02, threshold=5.215e+01, percent-clipped=1.0 2024-08-15 07:57:05,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3850, loss[loss=0.112, beats_loss=0.01127, ecapa_loss=0.0001439, whisper_loss=0.09926, over 21997.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001519, whisper_loss=0.0906, over 3877833.09 frames. ], batch size: 90, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:57:28,559 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 07:57:31,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3081970.0, ans=0.0 2024-08-15 07:57:36,686 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-15 07:57:52,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3082070.0, ans=0.1 2024-08-15 07:58:11,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3082170.0, ans=0.2 2024-08-15 07:58:14,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3900, loss[loss=0.09062, beats_loss=0.008925, ecapa_loss=0.0001547, whisper_loss=0.08015, over 20815.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01065, ecapa_loss=0.0001524, whisper_loss=0.09112, over 3920788.15 frames. ], batch size: 84, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:58:21,180 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 07:58:41,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3082470.0, ans=0.2 2024-08-15 07:59:10,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3082670.0, ans=0.125 2024-08-15 07:59:14,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3082670.0, ans=0.2 2024-08-15 07:59:15,186 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-15 07:59:19,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.371e+01 2.557e+01 2.985e+01 4.331e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 07:59:21,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3082770.0, ans=0.025 2024-08-15 07:59:22,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 3950, loss[loss=0.1363, beats_loss=0.007699, ecapa_loss=0.0001844, whisper_loss=0.1267, over 24083.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001522, whisper_loss=0.09167, over 3928837.32 frames. ], batch size: 94, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 07:59:39,113 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 07:59:40,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-15 07:59:43,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-15 07:59:57,412 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 08:00:22,347 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 08:00:26,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3083170.0, ans=0.125 2024-08-15 08:00:31,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4000, loss[loss=0.1042, beats_loss=0.008485, ecapa_loss=0.0001819, whisper_loss=0.09392, over 18649.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001532, whisper_loss=0.09142, over 3946866.81 frames. ], batch size: 73, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:00:40,806 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 08:00:50,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3083370.0, ans=0.1 2024-08-15 08:00:55,774 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 08:00:58,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3083470.0, ans=0.0 2024-08-15 08:01:17,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3083570.0, ans=0.125 2024-08-15 08:01:25,221 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-15 08:01:31,202 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 08:01:35,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3083670.0, ans=0.2 2024-08-15 08:01:39,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.375e+01 2.633e+01 2.839e+01 4.243e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-15 08:01:42,101 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4050, loss[loss=0.08173, beats_loss=0.01311, ecapa_loss=0.000135, whisper_loss=0.06727, over 21734.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001539, whisper_loss=0.0915, over 3959760.70 frames. ], batch size: 87, lr: 2.85e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:02:04,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3083870.0, ans=0.125 2024-08-15 08:02:04,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3083870.0, ans=0.125 2024-08-15 08:02:08,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083970.0, ans=0.1 2024-08-15 08:02:16,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2024-08-15 08:02:19,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3083970.0, ans=0.07 2024-08-15 08:02:23,776 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 08:02:24,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-15 08:02:26,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3084070.0, ans=0.0 2024-08-15 08:02:36,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3084170.0, ans=0.1 2024-08-15 08:02:45,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3084170.0, ans=0.09899494936611666 2024-08-15 08:02:49,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3084270.0, ans=0.125 2024-08-15 08:02:50,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4100, loss[loss=0.08631, beats_loss=0.009508, ecapa_loss=0.0001984, whisper_loss=0.07482, over 17563.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01055, ecapa_loss=0.0001538, whisper_loss=0.09201, over 3954517.56 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:03:03,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3084370.0, ans=10.0 2024-08-15 08:03:07,797 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 08:03:13,321 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 08:03:20,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3084470.0, ans=0.125 2024-08-15 08:03:25,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3084470.0, ans=0.0 2024-08-15 08:03:28,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3084470.0, ans=0.125 2024-08-15 08:03:44,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=12.0 2024-08-15 08:03:46,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3084670.0, ans=0.125 2024-08-15 08:03:57,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.396e+01 2.573e+01 2.888e+01 2.852e+02, threshold=5.147e+01, percent-clipped=1.0 2024-08-15 08:03:58,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3084670.0, ans=0.0 2024-08-15 08:04:00,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4150, loss[loss=0.1008, beats_loss=0.01244, ecapa_loss=0.0001395, whisper_loss=0.08693, over 22607.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01063, ecapa_loss=0.000153, whisper_loss=0.0917, over 3938608.27 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:04:04,336 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 08:04:13,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3084870.0, ans=0.125 2024-08-15 08:04:28,360 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 08:04:37,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3084970.0, ans=0.125 2024-08-15 08:04:38,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3084970.0, ans=0.125 2024-08-15 08:04:49,050 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-15 08:05:09,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4200, loss[loss=0.1105, beats_loss=0.01024, ecapa_loss=0.0001429, whisper_loss=0.09885, over 22580.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001519, whisper_loss=0.09177, over 3926075.29 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:05:09,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3085270.0, ans=0.125 2024-08-15 08:05:19,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3085270.0, ans=0.125 2024-08-15 08:05:34,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3085370.0, ans=0.125 2024-08-15 08:05:38,022 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 08:05:41,975 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 08:05:45,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085470.0, ans=0.1 2024-08-15 08:05:50,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-15 08:06:09,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3085670.0, ans=0.0 2024-08-15 08:06:15,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.224e+01 2.413e+01 2.831e+01 9.655e+01, threshold=4.827e+01, percent-clipped=1.0 2024-08-15 08:06:18,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4250, loss[loss=0.08933, beats_loss=0.01319, ecapa_loss=0.0001096, whisper_loss=0.07505, over 19824.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001506, whisper_loss=0.09071, over 3927223.93 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:06:18,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3085770.0, ans=0.125 2024-08-15 08:06:42,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3085870.0, ans=0.1 2024-08-15 08:06:49,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-15 08:06:51,825 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 08:06:54,788 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 08:07:00,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3086070.0, ans=0.125 2024-08-15 08:07:02,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=12.0 2024-08-15 08:07:06,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086070.0, ans=0.1 2024-08-15 08:07:07,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3086070.0, ans=0.0 2024-08-15 08:07:19,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3086170.0, ans=0.125 2024-08-15 08:07:28,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4300, loss[loss=0.1036, beats_loss=0.00968, ecapa_loss=0.0001691, whisper_loss=0.09227, over 22085.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001505, whisper_loss=0.0914, over 3885764.02 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:07:34,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.16 vs. limit=6.0 2024-08-15 08:07:43,734 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 08:07:52,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3086370.0, ans=0.1 2024-08-15 08:07:53,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3086370.0, ans=0.0 2024-08-15 08:07:59,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3086470.0, ans=0.035 2024-08-15 08:08:00,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086470.0, ans=0.1 2024-08-15 08:08:05,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3086470.0, ans=0.125 2024-08-15 08:08:26,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3086670.0, ans=0.1 2024-08-15 08:08:33,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3086670.0, ans=0.07 2024-08-15 08:08:35,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.282e+01 2.452e+01 2.695e+01 5.506e+01, threshold=4.904e+01, percent-clipped=1.0 2024-08-15 08:08:38,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4350, loss[loss=0.09022, beats_loss=0.01168, ecapa_loss=0.0001665, whisper_loss=0.07687, over 14957.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001514, whisper_loss=0.09076, over 3884074.79 frames. ], batch size: 62, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:08:47,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-15 08:08:59,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.54 vs. limit=22.5 2024-08-15 08:09:25,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3087070.0, ans=0.125 2024-08-15 08:09:26,410 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 08:09:26,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3087070.0, ans=0.09899494936611666 2024-08-15 08:09:46,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4400, loss[loss=0.1196, beats_loss=0.007376, ecapa_loss=0.0001659, whisper_loss=0.1106, over 21936.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001519, whisper_loss=0.0913, over 3900161.37 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:09:59,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087370.0, ans=0.1 2024-08-15 08:10:01,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3087370.0, ans=0.125 2024-08-15 08:10:52,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.331e+01 2.564e+01 2.900e+01 4.263e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:10:52,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3087670.0, ans=0.125 2024-08-15 08:10:55,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4450, loss[loss=0.09175, beats_loss=0.01093, ecapa_loss=0.0001421, whisper_loss=0.0794, over 18423.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001518, whisper_loss=0.09121, over 3900932.23 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:11:00,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-15 08:11:01,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-15 08:11:04,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3087770.0, ans=0.125 2024-08-15 08:11:08,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3087870.0, ans=0.125 2024-08-15 08:11:13,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3087870.0, ans=0.0 2024-08-15 08:11:22,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3087970.0, ans=0.125 2024-08-15 08:11:26,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3087970.0, ans=0.125 2024-08-15 08:11:34,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3087970.0, ans=0.125 2024-08-15 08:11:54,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3088170.0, ans=0.1 2024-08-15 08:11:56,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-15 08:12:05,039 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4500, loss[loss=0.1185, beats_loss=0.007788, ecapa_loss=0.0001348, whisper_loss=0.1094, over 13763.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001511, whisper_loss=0.09155, over 3886438.34 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:12:11,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3088270.0, ans=0.125 2024-08-15 08:12:19,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=12.0 2024-08-15 08:12:23,272 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 08:12:49,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3088570.0, ans=0.125 2024-08-15 08:12:51,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.53 vs. limit=22.5 2024-08-15 08:12:54,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3088570.0, ans=0.2 2024-08-15 08:12:54,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3088570.0, ans=0.0 2024-08-15 08:13:07,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3088670.0, ans=0.0 2024-08-15 08:13:11,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.272e+01 2.536e+01 2.739e+01 4.209e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-15 08:13:12,622 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 08:13:13,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4550, loss[loss=0.09046, beats_loss=0.009241, ecapa_loss=0.0001781, whisper_loss=0.07944, over 15172.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001508, whisper_loss=0.09158, over 3887903.61 frames. ], batch size: 64, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:13:17,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.18 vs. limit=10.0 2024-08-15 08:13:22,675 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 08:13:25,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3088770.0, ans=0.0 2024-08-15 08:13:33,181 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-15 08:13:37,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3088870.0, ans=0.0 2024-08-15 08:13:38,676 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 08:13:47,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3088970.0, ans=0.0 2024-08-15 08:13:47,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3088970.0, ans=0.125 2024-08-15 08:13:50,882 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-15 08:14:05,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-15 08:14:07,712 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 08:14:23,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4600, loss[loss=0.1163, beats_loss=0.009091, ecapa_loss=0.0001517, whisper_loss=0.1056, over 19500.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01067, ecapa_loss=0.0001501, whisper_loss=0.09116, over 3894750.97 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:14:37,006 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-15 08:14:42,626 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 08:14:44,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3089370.0, ans=0.125 2024-08-15 08:14:53,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2024-08-15 08:15:05,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3089470.0, ans=0.125 2024-08-15 08:15:18,504 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 08:15:25,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-15 08:15:34,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.352e+01 2.617e+01 2.995e+01 7.008e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-15 08:15:37,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4650, loss[loss=0.1035, beats_loss=0.01395, ecapa_loss=0.000122, whisper_loss=0.08828, over 22165.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.000151, whisper_loss=0.0904, over 3901291.03 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:15:38,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3089770.0, ans=0.0 2024-08-15 08:16:01,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3089870.0, ans=0.1 2024-08-15 08:16:05,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3089870.0, ans=0.1 2024-08-15 08:16:23,206 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 8 from Vox, 31 fro AS 2024-08-15 08:16:30,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2024-08-15 08:16:41,203 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 08:16:54,899 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4700, loss[loss=0.109, beats_loss=0.01101, ecapa_loss=0.0001485, whisper_loss=0.09655, over 20953.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001518, whisper_loss=0.0904, over 3896838.25 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:16:58,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3090270.0, ans=0.2 2024-08-15 08:17:01,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-15 08:17:08,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3090270.0, ans=0.125 2024-08-15 08:17:14,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3090370.0, ans=0.0 2024-08-15 08:17:21,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3090370.0, ans=0.1 2024-08-15 08:17:28,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3090470.0, ans=0.0 2024-08-15 08:17:40,275 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 08:17:56,292 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 08:17:59,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3090670.0, ans=0.125 2024-08-15 08:18:01,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3090670.0, ans=0.09899494936611666 2024-08-15 08:18:03,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3090670.0, ans=0.0 2024-08-15 08:18:12,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.597e+01 3.087e+01 7.330e+01, threshold=5.194e+01, percent-clipped=1.0 2024-08-15 08:18:13,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.92 vs. limit=5.0 2024-08-15 08:18:15,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4750, loss[loss=0.1214, beats_loss=0.007305, ecapa_loss=0.0001851, whisper_loss=0.1122, over 18668.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.0917, over 3942164.54 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 1.152921504606847e+18 2024-08-15 08:18:25,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3090770.0, ans=0.1 2024-08-15 08:18:47,330 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 08:18:50,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3090970.0, ans=10.0 2024-08-15 08:19:04,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3091070.0, ans=0.0 2024-08-15 08:19:24,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3091170.0, ans=0.125 2024-08-15 08:19:32,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4800, loss[loss=0.09285, beats_loss=0.00975, ecapa_loss=0.0002014, whisper_loss=0.08109, over 17011.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01072, ecapa_loss=0.0001518, whisper_loss=0.09115, over 3904346.67 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:19:48,779 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 08:19:54,985 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 08:19:55,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3091370.0, ans=0.1 2024-08-15 08:19:55,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-15 08:20:03,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3091470.0, ans=0.125 2024-08-15 08:20:03,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-15 08:20:09,559 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 08:20:40,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3091670.0, ans=0.0 2024-08-15 08:20:48,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3091670.0, ans=0.125 2024-08-15 08:20:50,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.389e+01 2.640e+01 2.976e+01 3.395e+02, threshold=5.281e+01, percent-clipped=5.0 2024-08-15 08:20:52,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4850, loss[loss=0.1123, beats_loss=0.009931, ecapa_loss=0.0001271, whisper_loss=0.1011, over 14656.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01071, ecapa_loss=0.0001527, whisper_loss=0.0912, over 3885986.32 frames. ], batch size: 55, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:20:57,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091770.0, ans=0.1 2024-08-15 08:21:16,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3091870.0, ans=0.09899494936611666 2024-08-15 08:21:16,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-08-15 08:21:57,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3092170.0, ans=0.0 2024-08-15 08:21:58,467 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 08:22:06,151 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 08:22:10,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3092170.0, ans=0.125 2024-08-15 08:22:12,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4900, loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001559, whisper_loss=0.08971, over 19111.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01072, ecapa_loss=0.0001528, whisper_loss=0.09083, over 3872125.48 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:22:19,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3092270.0, ans=0.0 2024-08-15 08:22:58,090 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 08:23:03,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3092570.0, ans=0.07 2024-08-15 08:23:03,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=12.0 2024-08-15 08:23:04,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3092570.0, ans=0.125 2024-08-15 08:23:12,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3092570.0, ans=0.125 2024-08-15 08:23:15,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092670.0, ans=0.1 2024-08-15 08:23:30,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.221e+01 2.391e+01 2.667e+01 3.893e+01, threshold=4.783e+01, percent-clipped=0.0 2024-08-15 08:23:31,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3092770.0, ans=0.125 2024-08-15 08:23:32,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 4950, loss[loss=0.09916, beats_loss=0.01145, ecapa_loss=0.0001412, whisper_loss=0.08629, over 17446.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001518, whisper_loss=0.09062, over 3842472.98 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:23:45,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3092770.0, ans=0.1 2024-08-15 08:23:56,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-15 08:24:00,050 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 08:24:19,873 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 08:24:22,953 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 08:24:23,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3093070.0, ans=0.125 2024-08-15 08:24:36,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3093170.0, ans=0.2 2024-08-15 08:24:37,420 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 08:24:44,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3093170.0, ans=0.1 2024-08-15 08:24:48,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5000, loss[loss=0.09783, beats_loss=0.01284, ecapa_loss=0.0001573, whisper_loss=0.08342, over 23657.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001536, whisper_loss=0.09018, over 3818427.54 frames. ], batch size: 95, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:24:51,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093270.0, ans=0.1 2024-08-15 08:24:56,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3093270.0, ans=0.04949747468305833 2024-08-15 08:24:58,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3093270.0, ans=0.0 2024-08-15 08:24:58,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-15 08:25:05,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-15 08:25:07,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093370.0, ans=0.1 2024-08-15 08:25:54,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-15 08:26:02,099 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 08:26:07,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.349e+01 2.649e+01 2.919e+01 1.466e+02, threshold=5.298e+01, percent-clipped=4.0 2024-08-15 08:26:09,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5050, loss[loss=0.09862, beats_loss=0.008876, ecapa_loss=0.0001596, whisper_loss=0.08815, over 17860.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0107, ecapa_loss=0.0001531, whisper_loss=0.09001, over 3847527.42 frames. ], batch size: 72, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:26:10,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-15 08:26:16,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3093770.0, ans=0.125 2024-08-15 08:26:29,269 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 08:26:54,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3093970.0, ans=0.2 2024-08-15 08:27:04,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3094070.0, ans=0.0 2024-08-15 08:27:30,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5100, loss[loss=0.1308, beats_loss=0.009659, ecapa_loss=0.0001342, whisper_loss=0.1198, over 20517.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001521, whisper_loss=0.09054, over 3855266.70 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:27:39,681 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-15 08:27:47,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2024-08-15 08:28:03,612 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 08:28:25,041 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 08:28:29,574 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 08:28:30,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3094570.0, ans=0.125 2024-08-15 08:28:31,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3094570.0, ans=0.125 2024-08-15 08:28:45,780 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 08:28:50,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.353e+01 2.564e+01 3.013e+01 4.662e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-15 08:28:51,401 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5150, loss[loss=0.1174, beats_loss=0.01003, ecapa_loss=0.0001386, whisper_loss=0.106, over 23148.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001512, whisper_loss=0.09013, over 3867389.42 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:29:18,392 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 08:29:40,328 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 08:29:49,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3095070.0, ans=0.2 2024-08-15 08:30:00,235 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 08:30:12,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5200, loss[loss=0.1225, beats_loss=0.009851, ecapa_loss=0.0001159, whisper_loss=0.1115, over 19888.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001519, whisper_loss=0.09032, over 3847228.31 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:30:29,581 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 20 from LS+wenet, 32 from Vox, 45 fro AS 2024-08-15 08:30:40,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3095370.0, ans=0.125 2024-08-15 08:30:50,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-15 08:31:08,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3095570.0, ans=0.0 2024-08-15 08:31:19,281 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 08:31:31,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.296e+01 2.558e+01 2.889e+01 4.447e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-15 08:31:32,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5250, loss[loss=0.1256, beats_loss=0.007469, ecapa_loss=0.0001696, whisper_loss=0.1164, over 19795.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001519, whisper_loss=0.09026, over 3830106.16 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:31:33,862 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 08:31:44,457 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-15 08:32:01,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-08-15 08:32:05,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3095970.0, ans=0.125 2024-08-15 08:32:06,491 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 08:32:19,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3096070.0, ans=0.0 2024-08-15 08:32:35,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3096170.0, ans=0.0 2024-08-15 08:32:42,652 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 08:32:51,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5300, loss[loss=0.09783, beats_loss=0.01012, ecapa_loss=0.0001213, whisper_loss=0.0865, over 15458.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.000152, whisper_loss=0.09089, over 3832379.21 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:33:31,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-15 08:33:34,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3096470.0, ans=0.0 2024-08-15 08:33:46,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-15 08:34:11,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.309e+01 2.587e+01 2.813e+01 1.007e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 08:34:12,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5350, loss[loss=0.09395, beats_loss=0.01086, ecapa_loss=0.0001344, whisper_loss=0.08175, over 14672.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001531, whisper_loss=0.09095, over 3814795.88 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:34:13,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3096770.0, ans=0.125 2024-08-15 08:34:27,652 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 08:34:32,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3096870.0, ans=0.0 2024-08-15 08:34:53,920 WARNING [optim.py:496] (3/4) Scaling gradients by 0.02987569198012352, model_norm_threshold=51.74189376831055 2024-08-15 08:34:54,108 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.299e+06, grad_sumsq=1.297e+08, orig_rms_sq=1.001e-02 2024-08-15 08:35:01,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097070.0, ans=0.1 2024-08-15 08:35:06,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097070.0, ans=0.1 2024-08-15 08:35:11,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=12.0 2024-08-15 08:35:14,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3097070.0, ans=0.125 2024-08-15 08:35:14,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-15 08:35:15,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3097070.0, ans=0.125 2024-08-15 08:35:19,723 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 08:35:33,191 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 08:35:34,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5400, loss[loss=0.1276, beats_loss=0.01076, ecapa_loss=0.0001482, whisper_loss=0.1153, over 22571.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001531, whisper_loss=0.09056, over 3774064.56 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:35:40,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3097270.0, ans=0.0 2024-08-15 08:35:55,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3097370.0, ans=0.125 2024-08-15 08:36:18,343 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 08:36:44,857 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07409150153398514, model_norm_threshold=51.74189376831055 2024-08-15 08:36:45,027 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.947e+04, grad_sumsq=3.947e+04, orig_rms_sq=1.000e+00 2024-08-15 08:36:59,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.348e+01 2.686e+01 2.991e+01 1.732e+03, threshold=5.372e+01, percent-clipped=3.0 2024-08-15 08:37:00,925 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5450, loss[loss=0.1071, beats_loss=0.01121, ecapa_loss=0.0001115, whisper_loss=0.09478, over 19718.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001531, whisper_loss=0.09101, over 3792784.63 frames. ], batch size: 75, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:37:05,043 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 08:37:06,479 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 08:37:10,740 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 08:37:10,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3097770.0, ans=0.0 2024-08-15 08:37:17,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3097770.0, ans=0.125 2024-08-15 08:37:25,690 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 08:37:56,467 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 08:38:03,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3098070.0, ans=0.125 2024-08-15 08:38:19,126 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-15 08:38:43,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5500, loss[loss=0.09794, beats_loss=0.01099, ecapa_loss=0.0001453, whisper_loss=0.08549, over 19710.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001521, whisper_loss=0.0909, over 3815050.01 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:39:00,717 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 08:40:22,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3098670.0, ans=0.125 2024-08-15 08:40:24,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.188e+01 2.412e+01 2.681e+01 4.153e+01, threshold=4.824e+01, percent-clipped=0.0 2024-08-15 08:40:25,962 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-15 08:40:28,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5550, loss[loss=0.1299, beats_loss=0.008864, ecapa_loss=0.0001471, whisper_loss=0.1196, over 15797.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001521, whisper_loss=0.09081, over 3832584.98 frames. ], batch size: 58, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:40:29,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3098770.0, ans=0.125 2024-08-15 08:40:37,606 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 08:40:42,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3098770.0, ans=0.125 2024-08-15 08:41:38,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=12.0 2024-08-15 08:42:13,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3099170.0, ans=0.125 2024-08-15 08:42:29,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5600, loss[loss=0.08222, beats_loss=0.01115, ecapa_loss=0.0001442, whisper_loss=0.06963, over 20238.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001511, whisper_loss=0.09095, over 3850394.07 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:42:41,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3099270.0, ans=0.125 2024-08-15 08:42:46,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099270.0, ans=0.1 2024-08-15 08:43:07,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099370.0, ans=0.1 2024-08-15 08:43:26,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3099470.0, ans=15.0 2024-08-15 08:43:48,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-15 08:43:48,656 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-15 08:44:35,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.363e+01 2.562e+01 2.966e+01 4.681e+01, threshold=5.125e+01, percent-clipped=0.0 2024-08-15 08:44:36,801 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5650, loss[loss=0.09651, beats_loss=0.01218, ecapa_loss=0.0001142, whisper_loss=0.08319, over 23055.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001508, whisper_loss=0.09025, over 3877589.92 frames. ], batch size: 88, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:45:32,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3099970.0, ans=0.125 2024-08-15 08:45:39,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3099970.0, ans=0.125 2024-08-15 08:45:44,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3099970.0, ans=0.125 2024-08-15 08:45:45,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3099970.0, ans=0.125 2024-08-15 08:46:17,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3100170.0, ans=0.2 2024-08-15 08:46:23,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5700, loss[loss=0.1145, beats_loss=0.009596, ecapa_loss=0.0001504, whisper_loss=0.1034, over 22751.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001513, whisper_loss=0.09056, over 3870031.90 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:46:26,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3100270.0, ans=0.0 2024-08-15 08:46:30,258 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 08:46:33,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3100270.0, ans=0.04949747468305833 2024-08-15 08:46:41,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100370.0, ans=0.1 2024-08-15 08:46:59,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3100470.0, ans=15.0 2024-08-15 08:47:01,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3100470.0, ans=0.0 2024-08-15 08:47:17,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3100570.0, ans=0.1 2024-08-15 08:47:27,537 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-15 08:47:41,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-08-15 08:47:43,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.415e+01 2.624e+01 2.975e+01 2.275e+02, threshold=5.249e+01, percent-clipped=3.0 2024-08-15 08:47:44,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5750, loss[loss=0.09915, beats_loss=0.009639, ecapa_loss=0.0001923, whisper_loss=0.08759, over 20123.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001534, whisper_loss=0.09055, over 3904290.52 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:47:48,512 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 08:47:52,657 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 08:48:27,573 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 08:48:33,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101070.0, ans=0.1 2024-08-15 08:48:44,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3101070.0, ans=0.0 2024-08-15 08:48:55,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3101170.0, ans=0.95 2024-08-15 08:48:57,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2024-08-15 08:49:05,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5800, loss[loss=0.08257, beats_loss=0.01242, ecapa_loss=0.0001245, whisper_loss=0.0689, over 17250.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001533, whisper_loss=0.09024, over 3897048.36 frames. ], batch size: 69, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:49:05,739 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 08:49:14,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3101270.0, ans=0.125 2024-08-15 08:49:33,400 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 08:49:33,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3101370.0, ans=0.125 2024-08-15 08:49:34,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101370.0, ans=0.125 2024-08-15 08:49:50,766 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 08:49:54,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3101570.0, ans=0.125 2024-08-15 08:50:10,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3101670.0, ans=0.125 2024-08-15 08:50:13,904 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 08:50:20,486 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 08:50:23,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.392e+01 2.742e+01 3.107e+01 2.079e+02, threshold=5.485e+01, percent-clipped=4.0 2024-08-15 08:50:24,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101770.0, ans=0.125 2024-08-15 08:50:25,284 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5850, loss[loss=0.1066, beats_loss=0.01163, ecapa_loss=0.0001599, whisper_loss=0.09336, over 22980.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001541, whisper_loss=0.09065, over 3916528.88 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:50:31,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3101770.0, ans=0.1 2024-08-15 08:50:42,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3101870.0, ans=0.125 2024-08-15 08:50:47,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3101870.0, ans=0.0 2024-08-15 08:50:56,860 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-15 08:50:58,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101970.0, ans=0.1 2024-08-15 08:50:58,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3101970.0, ans=0.125 2024-08-15 08:51:04,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3101970.0, ans=0.0 2024-08-15 08:51:06,904 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 08:51:14,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102070.0, ans=0.1 2024-08-15 08:51:24,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-15 08:51:44,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5900, loss[loss=0.1017, beats_loss=0.007655, ecapa_loss=0.0001434, whisper_loss=0.09266, over 16468.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001539, whisper_loss=0.09022, over 3887599.71 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:51:47,233 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 10 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 08:51:53,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3102270.0, ans=0.0 2024-08-15 08:51:54,451 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 08:52:07,425 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 08:52:09,050 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 08:52:15,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3102470.0, ans=0.125 2024-08-15 08:52:17,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-15 08:52:35,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3102570.0, ans=0.05 2024-08-15 08:52:40,293 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 08:52:42,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102570.0, ans=0.1 2024-08-15 08:52:43,543 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 08:52:59,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.266e+01 2.479e+01 2.808e+01 3.444e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-15 08:53:01,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 5950, loss[loss=0.1106, beats_loss=0.01043, ecapa_loss=0.0001372, whisper_loss=0.09879, over 14164.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001537, whisper_loss=0.08985, over 3874593.50 frames. ], batch size: 54, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:53:09,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3102770.0, ans=0.2 2024-08-15 08:53:20,070 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 08:53:32,156 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 08:53:47,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3103070.0, ans=0.0 2024-08-15 08:53:49,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103070.0, ans=0.1 2024-08-15 08:53:52,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3103070.0, ans=0.0 2024-08-15 08:53:56,542 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 08:54:03,914 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 08:54:08,174 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 08:54:18,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6000, loss[loss=0.1001, beats_loss=0.01141, ecapa_loss=0.0001391, whisper_loss=0.08725, over 16314.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001536, whisper_loss=0.09041, over 3871702.66 frames. ], batch size: 63, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:54:18,676 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 08:54:59,204 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005326, whisper_loss=0.2471, over 922467.00 frames. 2024-08-15 08:55:14,754 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on SV_voxceleb1: loss=0.004204, beats_loss=0, ecapa_loss=0.0004204, whisper_loss=0, over 939242.00 frames. 2024-08-15 08:57:14,010 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 08:57:14,020 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 08:57:19,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103270.0, ans=0.1 2024-08-15 08:57:28,086 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 08:57:38,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3103370.0, ans=0.0 2024-08-15 08:57:40,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3103370.0, ans=0.0 2024-08-15 08:57:46,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3103470.0, ans=0.07 2024-08-15 08:57:55,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-15 08:58:05,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-08-15 08:58:21,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3103670.0, ans=0.1 2024-08-15 08:58:22,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.13 vs. limit=10.0 2024-08-15 08:58:23,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3103670.0, ans=0.0 2024-08-15 08:58:28,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.350e+01 2.585e+01 2.886e+01 6.077e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-15 08:58:30,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6050, loss[loss=0.1231, beats_loss=0.008865, ecapa_loss=0.0001897, whisper_loss=0.1124, over 18895.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001536, whisper_loss=0.09109, over 3865053.06 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:59:16,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3104070.0, ans=0.2 2024-08-15 08:59:17,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=8.0 2024-08-15 08:59:18,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3104070.0, ans=0.125 2024-08-15 08:59:40,186 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 08:59:45,329 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6100, loss[loss=0.1049, beats_loss=0.009538, ecapa_loss=0.0001717, whisper_loss=0.09369, over 17623.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001521, whisper_loss=0.09081, over 3846688.63 frames. ], batch size: 69, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 08:59:53,271 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 09:00:00,491 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 09:00:02,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3104370.0, ans=0.0 2024-08-15 09:00:02,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-15 09:00:06,141 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-15 09:00:06,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3104370.0, ans=0.125 2024-08-15 09:00:21,269 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 09:00:21,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3104470.0, ans=10.0 2024-08-15 09:00:30,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104570.0, ans=0.1 2024-08-15 09:00:32,822 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-15 09:00:51,599 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 09:00:54,640 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 09:00:57,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.233e+01 2.517e+01 2.744e+01 4.126e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-15 09:00:58,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6150, loss[loss=0.1086, beats_loss=0.009448, ecapa_loss=0.0001598, whisper_loss=0.09751, over 21836.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01068, ecapa_loss=0.0001515, whisper_loss=0.09084, over 3850766.55 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:01:00,389 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 09:01:19,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3104870.0, ans=0.125 2024-08-15 09:01:27,733 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 09:01:29,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3104970.0, ans=0.0 2024-08-15 09:01:31,879 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 09:01:33,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3104970.0, ans=0.125 2024-08-15 09:01:37,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3104970.0, ans=0.0 2024-08-15 09:01:48,332 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 09:02:13,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6200, loss[loss=0.09294, beats_loss=0.01386, ecapa_loss=0.0001306, whisper_loss=0.07778, over 22403.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001503, whisper_loss=0.09065, over 3867196.25 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:02:32,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3105370.0, ans=0.125 2024-08-15 09:02:33,140 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 09:02:46,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3105470.0, ans=0.125 2024-08-15 09:02:50,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3105470.0, ans=0.05 2024-08-15 09:02:53,130 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-15 09:02:57,601 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.888e-02 2024-08-15 09:03:13,088 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 16 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 09:03:15,043 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-15 09:03:26,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.264e+01 2.437e+01 2.763e+01 4.898e+01, threshold=4.875e+01, percent-clipped=0.0 2024-08-15 09:03:28,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6250, loss[loss=0.08094, beats_loss=0.01072, ecapa_loss=0.0001821, whisper_loss=0.0684, over 15164.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001503, whisper_loss=0.09026, over 3847492.51 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:03:30,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3105770.0, ans=0.125 2024-08-15 09:03:33,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3105770.0, ans=0.2 2024-08-15 09:03:37,957 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 09:03:41,784 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 09:03:44,749 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 09:03:54,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2024-08-15 09:03:55,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3105870.0, ans=0.125 2024-08-15 09:03:57,094 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 09:03:57,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-15 09:03:58,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3105970.0, ans=0.0 2024-08-15 09:04:01,575 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 09:04:09,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3105970.0, ans=0.2 2024-08-15 09:04:45,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6300, loss[loss=0.09292, beats_loss=0.01254, ecapa_loss=0.0001439, whisper_loss=0.07895, over 17592.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001518, whisper_loss=0.0913, over 3860007.93 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:04:50,467 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 09:04:54,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-15 09:05:46,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3106570.0, ans=0.125 2024-08-15 09:05:51,866 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 09:05:52,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3106670.0, ans=0.1 2024-08-15 09:05:53,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-15 09:05:57,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3106670.0, ans=0.125 2024-08-15 09:05:59,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3106670.0, ans=0.0 2024-08-15 09:06:02,398 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 09:06:07,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.627e+01 2.982e+01 5.649e+01, threshold=5.254e+01, percent-clipped=1.0 2024-08-15 09:06:09,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6350, loss[loss=0.09762, beats_loss=0.01, ecapa_loss=0.0001566, whisper_loss=0.08605, over 18056.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09102, over 3862444.91 frames. ], batch size: 73, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:06:43,827 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-15 09:07:19,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=12.0 2024-08-15 09:07:24,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3107170.0, ans=0.0 2024-08-15 09:07:30,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6400, loss[loss=0.07952, beats_loss=0.01265, ecapa_loss=0.0001706, whisper_loss=0.06516, over 22150.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01066, ecapa_loss=0.0001507, whisper_loss=0.09132, over 3894296.16 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:07:33,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3107270.0, ans=0.1 2024-08-15 09:07:44,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3107270.0, ans=0.125 2024-08-15 09:07:47,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3107370.0, ans=0.125 2024-08-15 09:08:05,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3107470.0, ans=0.125 2024-08-15 09:08:08,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 09:08:10,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3107470.0, ans=0.0 2024-08-15 09:08:33,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3107570.0, ans=0.1 2024-08-15 09:08:41,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3107670.0, ans=0.125 2024-08-15 09:08:50,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-15 09:08:52,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.320e+01 2.533e+01 2.838e+01 5.335e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-15 09:08:54,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6450, loss[loss=0.08895, beats_loss=0.01176, ecapa_loss=0.0001309, whisper_loss=0.07588, over 20999.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.000151, whisper_loss=0.09184, over 3914200.27 frames. ], batch size: 84, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:08:57,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3107770.0, ans=0.0 2024-08-15 09:08:59,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3107770.0, ans=0.0 2024-08-15 09:09:04,200 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-15 09:09:11,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3107870.0, ans=0.0 2024-08-15 09:09:15,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3107870.0, ans=0.2 2024-08-15 09:09:18,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3107870.0, ans=0.125 2024-08-15 09:09:23,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=8.0 2024-08-15 09:09:32,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3107970.0, ans=0.0 2024-08-15 09:10:16,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6500, loss[loss=0.1154, beats_loss=0.01113, ecapa_loss=0.0001178, whisper_loss=0.1031, over 22369.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0106, ecapa_loss=0.000151, whisper_loss=0.09217, over 3922745.84 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:10:27,265 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 09:10:50,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3108470.0, ans=0.0 2024-08-15 09:11:16,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-15 09:11:28,858 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 09:11:30,119 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 09:11:33,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.377e+01 2.603e+01 2.970e+01 3.973e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-15 09:11:34,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6550, loss[loss=0.1151, beats_loss=0.01092, ecapa_loss=0.0001181, whisper_loss=0.103, over 16792.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01059, ecapa_loss=0.000151, whisper_loss=0.0922, over 3913632.56 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:11:35,028 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 09:11:45,713 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 09:11:47,783 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:12:01,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3108870.0, ans=0.2 2024-08-15 09:12:14,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3108970.0, ans=0.0 2024-08-15 09:12:34,200 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 09:12:34,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3109070.0, ans=0.0 2024-08-15 09:12:42,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3109170.0, ans=0.125 2024-08-15 09:12:53,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6600, loss[loss=0.1049, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.0929, over 23150.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01051, ecapa_loss=0.0001519, whisper_loss=0.09272, over 3928822.22 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:13:12,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3109370.0, ans=0.125 2024-08-15 09:13:46,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3109570.0, ans=0.1 2024-08-15 09:13:49,186 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 09:13:50,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109570.0, ans=0.1 2024-08-15 09:13:55,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3109670.0, ans=0.2 2024-08-15 09:13:57,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3109670.0, ans=0.125 2024-08-15 09:13:58,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3109670.0, ans=0.1 2024-08-15 09:14:03,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109670.0, ans=0.1 2024-08-15 09:14:08,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3109670.0, ans=0.0 2024-08-15 09:14:10,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.330e+01 2.492e+01 2.798e+01 4.030e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 09:14:10,564 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-15 09:14:11,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6650, loss[loss=0.1067, beats_loss=0.01003, ecapa_loss=0.0001553, whisper_loss=0.09512, over 18740.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01049, ecapa_loss=0.0001528, whisper_loss=0.09262, over 3919356.31 frames. ], batch size: 76, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:14:17,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3109770.0, ans=0.0 2024-08-15 09:14:24,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3109770.0, ans=0.125 2024-08-15 09:14:25,196 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 09:14:29,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3109870.0, ans=0.125 2024-08-15 09:14:33,509 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 09:14:34,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=12.0 2024-08-15 09:14:46,638 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 09:14:47,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3109970.0, ans=0.0 2024-08-15 09:14:53,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3109970.0, ans=0.2 2024-08-15 09:14:58,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109970.0, ans=0.1 2024-08-15 09:15:09,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-08-15 09:15:11,474 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-15 09:15:31,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6700, loss[loss=0.1097, beats_loss=0.0097, ecapa_loss=0.0001528, whisper_loss=0.09844, over 17766.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01049, ecapa_loss=0.0001523, whisper_loss=0.09252, over 3899757.88 frames. ], batch size: 73, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:15:40,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3110270.0, ans=0.125 2024-08-15 09:15:58,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3110370.0, ans=0.2 2024-08-15 09:16:07,071 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 09:16:08,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3110470.0, ans=0.125 2024-08-15 09:16:08,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3110470.0, ans=0.125 2024-08-15 09:16:29,154 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 09:16:41,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3110670.0, ans=0.125 2024-08-15 09:16:45,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3110670.0, ans=0.125 2024-08-15 09:16:55,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.348e+01 2.579e+01 2.866e+01 4.401e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 09:16:56,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6750, loss[loss=0.09305, beats_loss=0.009643, ecapa_loss=0.000151, whisper_loss=0.08189, over 17409.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001508, whisper_loss=0.09166, over 3888004.04 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:17:01,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3110770.0, ans=0.125 2024-08-15 09:17:09,594 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 14 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 09:17:32,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3110970.0, ans=0.0 2024-08-15 09:17:32,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3110970.0, ans=0.2 2024-08-15 09:17:50,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3111070.0, ans=0.125 2024-08-15 09:17:53,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3111070.0, ans=0.125 2024-08-15 09:18:12,262 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 09:18:13,736 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-15 09:18:21,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6800, loss[loss=0.1103, beats_loss=0.008865, ecapa_loss=0.0001539, whisper_loss=0.09989, over 22460.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001499, whisper_loss=0.09171, over 3887303.86 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:18:24,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3111270.0, ans=0.1 2024-08-15 09:18:36,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-15 09:18:50,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-15 09:18:51,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3111370.0, ans=0.125 2024-08-15 09:18:53,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3111370.0, ans=15.0 2024-08-15 09:19:14,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3111570.0, ans=0.125 2024-08-15 09:19:15,209 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 09:19:18,131 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 09:19:35,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3111670.0, ans=0.125 2024-08-15 09:19:36,818 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 09:19:41,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.366e+01 2.737e+01 3.020e+01 4.133e+01, threshold=5.473e+01, percent-clipped=0.0 2024-08-15 09:19:43,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6850, loss[loss=0.113, beats_loss=0.01078, ecapa_loss=0.0001368, whisper_loss=0.1008, over 17833.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01054, ecapa_loss=0.00015, whisper_loss=0.09209, over 3876192.44 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:19:48,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111770.0, ans=0.1 2024-08-15 09:19:56,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111770.0, ans=0.1 2024-08-15 09:19:56,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3111770.0, ans=0.125 2024-08-15 09:20:24,848 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 09:20:30,897 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 09:20:39,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3112070.0, ans=0.2 2024-08-15 09:20:59,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3112170.0, ans=0.0 2024-08-15 09:21:04,790 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:21:05,633 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6900, loss[loss=0.09006, beats_loss=0.01334, ecapa_loss=0.000139, whisper_loss=0.07533, over 19607.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01062, ecapa_loss=0.0001507, whisper_loss=0.0917, over 3842898.61 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:21:09,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3112270.0, ans=0.2 2024-08-15 09:21:21,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3112370.0, ans=0.0 2024-08-15 09:21:31,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3112370.0, ans=0.2 2024-08-15 09:21:41,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3112470.0, ans=0.07 2024-08-15 09:21:49,712 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 09:21:57,497 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-15 09:22:18,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3112670.0, ans=0.09899494936611666 2024-08-15 09:22:24,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3112670.0, ans=0.125 2024-08-15 09:22:25,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.330e+01 2.607e+01 2.906e+01 3.903e+01, threshold=5.213e+01, percent-clipped=0.0 2024-08-15 09:22:27,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 6950, loss[loss=0.1111, beats_loss=0.0114, ecapa_loss=0.0001475, whisper_loss=0.09824, over 22813.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001496, whisper_loss=0.09077, over 3844394.54 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:22:35,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3112770.0, ans=0.125 2024-08-15 09:22:56,691 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 18 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 09:23:04,801 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 09:23:07,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3112970.0, ans=0.1 2024-08-15 09:23:09,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3112970.0, ans=0.2 2024-08-15 09:23:31,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3113070.0, ans=0.125 2024-08-15 09:23:34,578 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-15 09:23:48,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3113170.0, ans=0.125 2024-08-15 09:23:49,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3113170.0, ans=0.125 2024-08-15 09:23:53,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7000, loss[loss=0.1011, beats_loss=0.01202, ecapa_loss=0.0001453, whisper_loss=0.08761, over 18124.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01074, ecapa_loss=0.0001482, whisper_loss=0.09056, over 3821867.87 frames. ], batch size: 73, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:24:00,483 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 09:24:06,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3113270.0, ans=0.0 2024-08-15 09:24:10,936 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:24:16,191 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-15 09:24:19,172 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 09:24:21,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3113370.0, ans=0.125 2024-08-15 09:24:27,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-15 09:24:31,689 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 09:24:32,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3113470.0, ans=0.0 2024-08-15 09:24:35,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3113470.0, ans=0.0 2024-08-15 09:24:58,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3113670.0, ans=0.2 2024-08-15 09:25:03,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3113670.0, ans=0.2 2024-08-15 09:25:04,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3113670.0, ans=0.125 2024-08-15 09:25:11,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3113670.0, ans=0.02 2024-08-15 09:25:11,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.285e+01 2.515e+01 2.817e+01 4.322e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-15 09:25:12,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3113770.0, ans=0.125 2024-08-15 09:25:13,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7050, loss[loss=0.07806, beats_loss=0.01383, ecapa_loss=0.000117, whisper_loss=0.06306, over 19572.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001488, whisper_loss=0.09081, over 3849766.33 frames. ], batch size: 79, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:25:31,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3113870.0, ans=0.0 2024-08-15 09:25:45,399 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-15 09:25:52,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=12.0 2024-08-15 09:26:09,634 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-15 09:26:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114070.0, ans=0.125 2024-08-15 09:26:31,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114170.0, ans=0.1 2024-08-15 09:26:31,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.02 vs. limit=10.0 2024-08-15 09:26:37,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7100, loss[loss=0.1129, beats_loss=0.009253, ecapa_loss=0.0001852, whisper_loss=0.1018, over 14136.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01073, ecapa_loss=0.0001493, whisper_loss=0.09055, over 3856231.15 frames. ], batch size: 59, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:26:40,570 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 09:26:47,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114270.0, ans=0.125 2024-08-15 09:27:02,567 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 09:27:02,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3114370.0, ans=0.0 2024-08-15 09:27:04,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3114370.0, ans=0.0 2024-08-15 09:27:05,198 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 09:27:13,830 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:27:14,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-15 09:27:16,807 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 09:27:32,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-15 09:27:37,079 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 09:27:39,978 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 09:27:53,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.259e+01 2.510e+01 2.858e+01 3.355e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-15 09:27:54,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-15 09:27:55,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7150, loss[loss=0.09188, beats_loss=0.0121, ecapa_loss=0.0001385, whisper_loss=0.0784, over 16482.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01073, ecapa_loss=0.0001493, whisper_loss=0.09078, over 3869004.82 frames. ], batch size: 66, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:27:57,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114770.0, ans=0.1 2024-08-15 09:28:50,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3115070.0, ans=0.125 2024-08-15 09:29:08,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3115170.0, ans=0.125 2024-08-15 09:29:18,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3115270.0, ans=0.2 2024-08-15 09:29:18,983 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7200, loss[loss=0.1187, beats_loss=0.01036, ecapa_loss=0.0001036, whisper_loss=0.1073, over 18466.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001493, whisper_loss=0.09018, over 3879839.38 frames. ], batch size: 70, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:29:34,604 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 09:29:38,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=12.0 2024-08-15 09:29:52,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3115370.0, ans=0.125 2024-08-15 09:29:52,968 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 09:30:00,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3115470.0, ans=0.0 2024-08-15 09:30:11,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:18,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3115570.0, ans=0.125 2024-08-15 09:30:41,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.339e+01 2.550e+01 2.963e+01 5.481e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-15 09:30:43,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7250, loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.000128, whisper_loss=0.09226, over 19753.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001499, whisper_loss=0.09064, over 3886931.46 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:30:45,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3115770.0, ans=0.125 2024-08-15 09:30:52,640 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 09:30:54,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3115770.0, ans=0.07 2024-08-15 09:30:56,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-15 09:30:59,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=15.0 2024-08-15 09:31:03,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3115870.0, ans=0.0 2024-08-15 09:31:41,741 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 09:31:43,197 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 09:31:49,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3116170.0, ans=0.125 2024-08-15 09:31:52,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.77 vs. limit=15.0 2024-08-15 09:32:02,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7300, loss[loss=0.1051, beats_loss=0.01033, ecapa_loss=0.0001696, whisper_loss=0.09309, over 21308.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001511, whisper_loss=0.09064, over 3871288.97 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:32:05,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3116270.0, ans=0.125 2024-08-15 09:32:07,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3116270.0, ans=0.1 2024-08-15 09:32:08,148 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-15 09:32:29,910 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 09:32:47,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3116470.0, ans=0.0 2024-08-15 09:32:48,255 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 09:32:49,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3116570.0, ans=0.0 2024-08-15 09:33:00,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-08-15 09:33:15,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3116670.0, ans=0.07 2024-08-15 09:33:18,040 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-15 09:33:20,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.398e+01 2.645e+01 3.010e+01 2.880e+02, threshold=5.290e+01, percent-clipped=2.0 2024-08-15 09:33:22,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7350, loss[loss=0.114, beats_loss=0.01125, ecapa_loss=0.0001336, whisper_loss=0.1014, over 23378.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001515, whisper_loss=0.09052, over 3872325.90 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:33:26,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2024-08-15 09:33:33,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3116770.0, ans=0.0 2024-08-15 09:33:40,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3116870.0, ans=0.0 2024-08-15 09:33:52,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3116970.0, ans=0.2 2024-08-15 09:33:52,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116970.0, ans=0.1 2024-08-15 09:33:53,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3116970.0, ans=0.0 2024-08-15 09:33:56,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3116970.0, ans=0.125 2024-08-15 09:34:10,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3117070.0, ans=0.125 2024-08-15 09:34:24,636 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 09:34:35,120 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 09:34:39,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7400, loss[loss=0.08835, beats_loss=0.01194, ecapa_loss=0.000142, whisper_loss=0.07499, over 16684.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001515, whisper_loss=0.08994, over 3869701.48 frames. ], batch size: 68, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:34:44,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3117270.0, ans=0.0 2024-08-15 09:34:59,904 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 09:35:18,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3117470.0, ans=0.0 2024-08-15 09:35:33,623 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-15 09:35:39,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3117570.0, ans=0.125 2024-08-15 09:35:54,817 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 09:35:59,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.404e+01 2.626e+01 2.946e+01 5.024e+01, threshold=5.253e+01, percent-clipped=0.0 2024-08-15 09:35:59,779 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 09:36:01,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7450, loss[loss=0.109, beats_loss=0.009902, ecapa_loss=0.0001446, whisper_loss=0.0976, over 21825.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001514, whisper_loss=0.09009, over 3878255.92 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 1.152921504606847e+18 2024-08-15 09:36:20,943 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 09:36:22,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3117870.0, ans=0.125 2024-08-15 09:36:26,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3117870.0, ans=0.0 2024-08-15 09:36:27,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3117870.0, ans=0.05 2024-08-15 09:36:51,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-15 09:37:01,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118070.0, ans=0.125 2024-08-15 09:37:17,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7500, loss[loss=0.109, beats_loss=0.008443, ecapa_loss=0.0001436, whisper_loss=0.0991, over 16522.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001515, whisper_loss=0.08967, over 3892394.97 frames. ], batch size: 62, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:37:20,777 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 09:37:23,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118270.0, ans=0.1 2024-08-15 09:37:54,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118470.0, ans=0.125 2024-08-15 09:38:32,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.370e+01 2.711e+01 2.994e+01 4.451e+02, threshold=5.422e+01, percent-clipped=4.0 2024-08-15 09:38:32,537 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7550, loss[loss=0.1091, beats_loss=0.0107, ecapa_loss=0.0001447, whisper_loss=0.09695, over 23365.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001517, whisper_loss=0.08962, over 3857189.42 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:38:45,408 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-15 09:39:11,557 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 09:39:28,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2024-08-15 09:39:39,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3119170.0, ans=0.125 2024-08-15 09:39:52,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-15 09:39:52,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7600, loss[loss=0.08472, beats_loss=0.01224, ecapa_loss=0.0001414, whisper_loss=0.07107, over 18021.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01066, ecapa_loss=0.0001521, whisper_loss=0.08937, over 3835036.26 frames. ], batch size: 71, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:39:55,007 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:39:56,035 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 09:40:00,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3119270.0, ans=0.125 2024-08-15 09:40:21,003 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 09:40:26,799 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 09:40:40,432 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-15 09:40:49,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3119570.0, ans=0.125 2024-08-15 09:40:50,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3119570.0, ans=0.2 2024-08-15 09:40:57,754 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 09:41:09,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.254e+01 2.450e+01 2.637e+01 4.565e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-15 09:41:09,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7650, loss[loss=0.1064, beats_loss=0.01163, ecapa_loss=0.0001303, whisper_loss=0.0935, over 22638.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001518, whisper_loss=0.08937, over 3851615.26 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:41:10,047 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 09:41:30,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3119870.0, ans=0.125 2024-08-15 09:41:46,889 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.254e+00 2024-08-15 09:41:48,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3119970.0, ans=0.2 2024-08-15 09:41:55,746 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 09:42:12,629 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 09:42:16,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3120170.0, ans=0.07 2024-08-15 09:42:25,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=15.0 2024-08-15 09:42:27,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7700, loss[loss=0.11, beats_loss=0.01124, ecapa_loss=0.0001076, whisper_loss=0.09771, over 17405.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001516, whisper_loss=0.08916, over 3868508.43 frames. ], batch size: 64, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:42:34,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-15 09:42:35,134 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 09:42:48,144 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 39 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 09:42:59,953 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 09:43:21,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3120570.0, ans=0.0 2024-08-15 09:43:28,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3120570.0, ans=0.2 2024-08-15 09:43:34,647 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 09:43:35,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3120670.0, ans=0.0 2024-08-15 09:43:45,322 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 09:43:46,311 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:43:47,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.349e+01 2.670e+01 3.107e+01 2.208e+02, threshold=5.341e+01, percent-clipped=1.0 2024-08-15 09:43:47,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7750, loss[loss=0.1159, beats_loss=0.01123, ecapa_loss=0.0001368, whisper_loss=0.1033, over 22272.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001515, whisper_loss=0.08923, over 3867880.92 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:43:50,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-15 09:43:51,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3120770.0, ans=0.1 2024-08-15 09:43:52,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3120770.0, ans=0.125 2024-08-15 09:43:59,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2024-08-15 09:44:09,408 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 09:44:13,986 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 09:44:15,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3120870.0, ans=0.125 2024-08-15 09:44:21,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3120970.0, ans=0.2 2024-08-15 09:44:28,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-08-15 09:44:29,913 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 09:44:30,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3120970.0, ans=0.125 2024-08-15 09:44:41,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3121070.0, ans=0.125 2024-08-15 09:45:04,595 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-15 09:45:05,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7800, loss[loss=0.09136, beats_loss=0.01217, ecapa_loss=0.0001597, whisper_loss=0.07759, over 21808.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.000152, whisper_loss=0.08968, over 3883724.59 frames. ], batch size: 93, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:45:11,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3121270.0, ans=0.0 2024-08-15 09:45:14,954 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 09:45:36,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-15 09:45:59,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3121570.0, ans=0.125 2024-08-15 09:46:01,486 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 09:46:20,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=22.5 2024-08-15 09:46:25,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.314e+01 2.626e+01 3.075e+01 3.695e+02, threshold=5.252e+01, percent-clipped=2.0 2024-08-15 09:46:25,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7850, loss[loss=0.09631, beats_loss=0.01394, ecapa_loss=0.0001058, whisper_loss=0.08132, over 22204.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.0001513, whisper_loss=0.0901, over 3943655.07 frames. ], batch size: 87, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:46:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3121770.0, ans=0.2 2024-08-15 09:46:38,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3121870.0, ans=0.1 2024-08-15 09:46:43,432 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 09:46:46,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3121870.0, ans=0.035 2024-08-15 09:46:48,957 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 09:46:50,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3121870.0, ans=0.125 2024-08-15 09:46:51,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-15 09:47:25,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3122170.0, ans=0.2 2024-08-15 09:47:33,742 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7900, loss[loss=0.1033, beats_loss=0.01167, ecapa_loss=0.0001477, whisper_loss=0.09017, over 18770.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.000152, whisper_loss=0.09072, over 3926742.97 frames. ], batch size: 76, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:47:34,216 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 09:47:51,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3122370.0, ans=0.125 2024-08-15 09:48:00,101 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 09:48:19,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3122570.0, ans=0.125 2024-08-15 09:48:22,468 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-15 09:48:28,111 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-15 09:48:36,270 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 09:48:43,609 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-15 09:48:43,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3122770.0, ans=0.125 2024-08-15 09:48:44,835 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.277e+01 2.656e+01 2.964e+01 2.137e+02, threshold=5.312e+01, percent-clipped=1.0 2024-08-15 09:48:44,855 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 7950, loss[loss=0.0976, beats_loss=0.01057, ecapa_loss=0.0001678, whisper_loss=0.08535, over 22480.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01081, ecapa_loss=0.0001501, whisper_loss=0.09014, over 3933993.96 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:48:46,575 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 09:48:51,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-15 09:48:55,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-15 09:49:07,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=8.0 2024-08-15 09:49:08,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3122870.0, ans=0.2 2024-08-15 09:49:10,823 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 34 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 09:49:26,930 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 09:49:29,830 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 09:49:32,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3123070.0, ans=0.0 2024-08-15 09:49:38,530 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 09:49:39,813 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 09:49:46,567 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 09:49:58,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8000, loss[loss=0.08579, beats_loss=0.01158, ecapa_loss=0.0001446, whisper_loss=0.07276, over 21751.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001502, whisper_loss=0.0901, over 3929584.93 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:49:59,213 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 09:49:59,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3123270.0, ans=0.0 2024-08-15 09:50:00,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3123270.0, ans=0.125 2024-08-15 09:50:05,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.67 vs. limit=22.5 2024-08-15 09:50:21,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3123370.0, ans=0.125 2024-08-15 09:50:24,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-15 09:50:50,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3123570.0, ans=0.0 2024-08-15 09:51:02,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3123670.0, ans=0.0 2024-08-15 09:51:09,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3123670.0, ans=0.0 2024-08-15 09:51:11,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3123670.0, ans=0.0 2024-08-15 09:51:13,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.280e+01 2.544e+01 2.885e+01 5.910e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-15 09:51:13,561 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8050, loss[loss=0.1061, beats_loss=0.01017, ecapa_loss=0.0001537, whisper_loss=0.09438, over 20512.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001504, whisper_loss=0.0902, over 3928515.21 frames. ], batch size: 84, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:51:16,734 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 09:51:18,266 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-15 09:51:22,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2024-08-15 09:51:35,551 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 09:51:44,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-15 09:52:20,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3124170.0, ans=0.125 2024-08-15 09:52:21,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3124170.0, ans=0.125 2024-08-15 09:52:24,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3124270.0, ans=0.2 2024-08-15 09:52:25,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8100, loss[loss=0.1236, beats_loss=0.008389, ecapa_loss=0.0001466, whisper_loss=0.1137, over 24809.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001503, whisper_loss=0.09049, over 3914197.17 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:52:38,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3124370.0, ans=0.04949747468305833 2024-08-15 09:52:38,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2024-08-15 09:52:40,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3124370.0, ans=0.125 2024-08-15 09:52:44,180 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-15 09:52:47,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2024-08-15 09:52:55,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3124470.0, ans=0.07 2024-08-15 09:53:02,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3124470.0, ans=0.1 2024-08-15 09:53:17,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3124570.0, ans=0.1 2024-08-15 09:53:40,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.386e+01 2.609e+01 2.958e+01 3.972e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 09:53:40,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8150, loss[loss=0.102, beats_loss=0.01226, ecapa_loss=0.0001151, whisper_loss=0.08863, over 15150.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01073, ecapa_loss=0.0001502, whisper_loss=0.09002, over 3917383.69 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:53:43,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3124770.0, ans=0.0 2024-08-15 09:54:08,764 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 09:54:09,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3124870.0, ans=0.125 2024-08-15 09:54:19,934 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 09:54:35,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3125070.0, ans=0.125 2024-08-15 09:55:06,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3125270.0, ans=0.09899494936611666 2024-08-15 09:55:06,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8200, loss[loss=0.08977, beats_loss=0.009946, ecapa_loss=0.0001348, whisper_loss=0.07847, over 14756.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001503, whisper_loss=0.08955, over 3882436.56 frames. ], batch size: 55, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:55:13,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-15 09:55:14,100 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 09:55:19,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3125270.0, ans=15.0 2024-08-15 09:55:33,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3125370.0, ans=0.125 2024-08-15 09:55:46,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3125470.0, ans=0.0 2024-08-15 09:55:47,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2024-08-15 09:56:00,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3125570.0, ans=0.125 2024-08-15 09:56:07,131 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 09:56:15,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3125670.0, ans=0.0 2024-08-15 09:56:17,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3125670.0, ans=0.125 2024-08-15 09:56:20,943 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08162933588027954, model_norm_threshold=52.17145538330078 2024-08-15 09:56:21,114 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.265e+04, grad_sumsq=3.250e+06, orig_rms_sq=1.005e-02 2024-08-15 09:56:23,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.208e+01 2.504e+01 2.791e+01 6.391e+02, threshold=5.008e+01, percent-clipped=1.0 2024-08-15 09:56:23,736 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8250, loss[loss=0.09261, beats_loss=0.01187, ecapa_loss=0.0001479, whisper_loss=0.07926, over 18633.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001509, whisper_loss=0.08966, over 3911632.18 frames. ], batch size: 77, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:56:27,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3125770.0, ans=0.125 2024-08-15 09:56:30,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-15 09:56:32,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3125770.0, ans=0.2 2024-08-15 09:56:33,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3125770.0, ans=0.125 2024-08-15 09:56:38,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3125870.0, ans=0.0 2024-08-15 09:56:43,767 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 09:56:46,693 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 09:56:48,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3125870.0, ans=0.0 2024-08-15 09:56:50,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3125870.0, ans=0.1 2024-08-15 09:57:04,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3125970.0, ans=0.125 2024-08-15 09:57:05,748 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 09:57:27,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3126170.0, ans=0.0 2024-08-15 09:57:37,708 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8300, loss[loss=0.1109, beats_loss=0.011, ecapa_loss=0.0001318, whisper_loss=0.09858, over 23413.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01078, ecapa_loss=0.0001488, whisper_loss=0.08921, over 3920557.60 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:57:38,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3126270.0, ans=0.07 2024-08-15 09:57:49,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-08-15 09:57:58,234 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 09:58:04,598 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 09:58:14,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3126470.0, ans=0.0 2024-08-15 09:58:21,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2024-08-15 09:58:51,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3126670.0, ans=0.125 2024-08-15 09:58:54,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3126670.0, ans=0.0 2024-08-15 09:59:00,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.411e+01 2.673e+01 3.013e+01 2.459e+02, threshold=5.345e+01, percent-clipped=1.0 2024-08-15 09:59:00,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8350, loss[loss=0.08325, beats_loss=0.01413, ecapa_loss=0.0001356, whisper_loss=0.06776, over 21155.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01085, ecapa_loss=0.0001507, whisper_loss=0.08881, over 3941851.36 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 09:59:10,518 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 09:59:12,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3126770.0, ans=0.2 2024-08-15 09:59:27,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3126870.0, ans=0.125 2024-08-15 09:59:40,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3126970.0, ans=0.125 2024-08-15 09:59:41,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3126970.0, ans=0.125 2024-08-15 09:59:54,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2024-08-15 10:00:00,996 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 10:00:05,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3127170.0, ans=0.0 2024-08-15 10:00:12,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3127170.0, ans=0.5 2024-08-15 10:00:16,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8400, loss[loss=0.1205, beats_loss=0.008342, ecapa_loss=0.0001896, whisper_loss=0.1103, over 18691.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01072, ecapa_loss=0.0001506, whisper_loss=0.08986, over 3927723.79 frames. ], batch size: 74, lr: 2.83e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:00:19,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2024-08-15 10:00:26,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3127270.0, ans=22.5 2024-08-15 10:00:39,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3127370.0, ans=0.2 2024-08-15 10:00:50,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.96 vs. limit=5.0 2024-08-15 10:01:05,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3127570.0, ans=0.125 2024-08-15 10:01:09,597 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 10:01:18,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3127670.0, ans=0.0 2024-08-15 10:01:29,983 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 10:01:31,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.355e+01 2.603e+01 2.827e+01 7.121e+01, threshold=5.205e+01, percent-clipped=1.0 2024-08-15 10:01:31,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8450, loss[loss=0.09567, beats_loss=0.01055, ecapa_loss=0.0001691, whisper_loss=0.08343, over 20385.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001506, whisper_loss=0.08995, over 3903945.48 frames. ], batch size: 83, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:01:34,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3127770.0, ans=0.125 2024-08-15 10:01:47,893 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 10:01:50,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3127870.0, ans=0.2 2024-08-15 10:01:54,794 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 10:02:16,476 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 10:02:25,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=12.0 2024-08-15 10:02:26,354 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 10:02:27,947 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 10:02:38,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-15 10:02:40,935 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 10:02:41,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3128170.0, ans=0.125 2024-08-15 10:02:49,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3128170.0, ans=0.125 2024-08-15 10:02:50,897 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 10:02:52,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8500, loss[loss=0.1203, beats_loss=0.01088, ecapa_loss=0.0001431, whisper_loss=0.108, over 24284.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01071, ecapa_loss=0.0001511, whisper_loss=0.08943, over 3915876.09 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:03:02,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3128270.0, ans=0.125 2024-08-15 10:03:03,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3128270.0, ans=0.0 2024-08-15 10:03:05,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3128270.0, ans=0.125 2024-08-15 10:03:19,291 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 10:03:23,004 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.504e+01 2024-08-15 10:03:35,651 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 10:03:42,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128570.0, ans=0.1 2024-08-15 10:03:59,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3128670.0, ans=0.1 2024-08-15 10:04:05,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3128670.0, ans=0.05 2024-08-15 10:04:11,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.400e+01 2.720e+01 3.121e+01 2.458e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 10:04:11,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8550, loss[loss=0.08804, beats_loss=0.01051, ecapa_loss=0.0001854, whisper_loss=0.07567, over 16105.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001497, whisper_loss=0.08954, over 3906800.15 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:04:18,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.68 vs. limit=15.0 2024-08-15 10:04:25,333 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 18 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-15 10:04:45,549 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 10:04:50,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-08-15 10:04:55,597 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 10:04:59,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3129070.0, ans=0.0 2024-08-15 10:05:06,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-15 10:05:07,364 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 10:05:25,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8600, loss[loss=0.1196, beats_loss=0.01157, ecapa_loss=0.0001601, whisper_loss=0.1064, over 22098.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001494, whisper_loss=0.08995, over 3875011.56 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:05:44,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3129370.0, ans=0.07 2024-08-15 10:05:49,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3129370.0, ans=0.015 2024-08-15 10:05:50,404 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-15 10:06:01,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129470.0, ans=0.125 2024-08-15 10:06:17,520 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 10:06:29,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3129670.0, ans=0.1 2024-08-15 10:06:37,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.410e+01 2.647e+01 2.940e+01 4.400e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-15 10:06:37,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8650, loss[loss=0.08964, beats_loss=0.01175, ecapa_loss=0.0001476, whisper_loss=0.07642, over 14826.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001498, whisper_loss=0.08982, over 3865266.79 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:06:39,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=22.5 2024-08-15 10:06:41,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3129770.0, ans=0.1 2024-08-15 10:06:49,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3129770.0, ans=0.0 2024-08-15 10:06:53,865 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-15 10:06:55,407 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 10:07:10,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3129970.0, ans=0.125 2024-08-15 10:07:15,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3129970.0, ans=0.125 2024-08-15 10:07:35,957 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 10:07:41,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3130070.0, ans=0.1 2024-08-15 10:07:44,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3130170.0, ans=0.0 2024-08-15 10:08:00,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8700, loss[loss=0.1025, beats_loss=0.009284, ecapa_loss=0.000132, whisper_loss=0.09191, over 17607.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001515, whisper_loss=0.09043, over 3870205.57 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:08:06,853 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 10:08:11,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3130270.0, ans=0.2 2024-08-15 10:08:16,658 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 10:08:37,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3130470.0, ans=0.0 2024-08-15 10:08:45,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3130470.0, ans=0.125 2024-08-15 10:09:00,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3130570.0, ans=0.125 2024-08-15 10:09:22,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.333e+01 2.545e+01 2.859e+01 2.640e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-15 10:09:22,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8750, loss[loss=0.1095, beats_loss=0.0117, ecapa_loss=0.0001256, whisper_loss=0.09652, over 20166.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001511, whisper_loss=0.09008, over 3855424.35 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:09:36,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2024-08-15 10:09:39,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3130870.0, ans=0.015 2024-08-15 10:09:43,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3130870.0, ans=0.125 2024-08-15 10:09:47,610 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 10:09:48,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3130870.0, ans=0.2 2024-08-15 10:10:02,303 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 10:10:05,150 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 10:10:08,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3130970.0, ans=0.07 2024-08-15 10:10:11,852 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 10:10:17,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3131070.0, ans=0.0 2024-08-15 10:10:41,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8800, loss[loss=0.114, beats_loss=0.007719, ecapa_loss=0.0001468, whisper_loss=0.1048, over 18128.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001513, whisper_loss=0.08994, over 3883462.54 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:10:54,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3131370.0, ans=0.0 2024-08-15 10:11:05,014 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 10:11:10,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3131470.0, ans=0.125 2024-08-15 10:11:20,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-08-15 10:11:23,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3131470.0, ans=0.125 2024-08-15 10:11:46,375 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 10:11:52,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-15 10:11:54,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.280e+01 2.531e+01 2.796e+01 4.372e+01, threshold=5.061e+01, percent-clipped=0.0 2024-08-15 10:11:54,634 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8850, loss[loss=0.08399, beats_loss=0.01325, ecapa_loss=0.0001761, whisper_loss=0.06899, over 14959.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001507, whisper_loss=0.08971, over 3874991.58 frames. ], batch size: 63, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:11:56,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3131770.0, ans=0.125 2024-08-15 10:12:06,401 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-15 10:12:11,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3131870.0, ans=0.0 2024-08-15 10:12:14,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-15 10:12:16,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-15 10:12:35,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2024-08-15 10:12:42,422 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 10:12:51,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3132070.0, ans=0.125 2024-08-15 10:12:56,799 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 10:12:59,824 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 10:13:02,731 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 10:13:08,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8900, loss[loss=0.1139, beats_loss=0.01084, ecapa_loss=0.0001458, whisper_loss=0.1016, over 18951.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.08988, over 3873270.88 frames. ], batch size: 73, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:13:09,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3132270.0, ans=10.0 2024-08-15 10:13:13,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3132270.0, ans=0.09899494936611666 2024-08-15 10:13:22,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=22.5 2024-08-15 10:13:23,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3132370.0, ans=0.2 2024-08-15 10:13:26,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132370.0, ans=0.1 2024-08-15 10:13:29,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3132370.0, ans=0.2 2024-08-15 10:13:34,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3132370.0, ans=0.125 2024-08-15 10:14:11,024 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 10:14:18,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.382e+01 2.533e+01 2.856e+01 1.200e+02, threshold=5.066e+01, percent-clipped=2.0 2024-08-15 10:14:18,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 8950, loss[loss=0.1073, beats_loss=0.01246, ecapa_loss=0.0001128, whisper_loss=0.09369, over 16457.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01072, ecapa_loss=0.0001504, whisper_loss=0.08966, over 3870323.91 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:14:19,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3132770.0, ans=0.125 2024-08-15 10:14:27,887 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 10:14:30,751 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 10:14:32,213 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 10:14:42,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3132870.0, ans=0.125 2024-08-15 10:14:43,250 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 10:15:06,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-15 10:15:17,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133170.0, ans=0.1 2024-08-15 10:15:18,647 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 10:15:29,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9000, loss[loss=0.0939, beats_loss=0.01143, ecapa_loss=0.0001526, whisper_loss=0.08094, over 22223.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001507, whisper_loss=0.09037, over 3885682.62 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:15:29,238 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 10:16:12,806 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005364, whisper_loss=0.2468, over 922467.00 frames. 2024-08-15 10:16:35,741 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-15 10:17:08,178 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2290, 1.9048, 1.8782, 1.7946], device='cuda:3') 2024-08-15 10:18:37,928 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 10:18:37,931 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 10:18:43,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3133270.0, ans=10.0 2024-08-15 10:18:48,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3133270.0, ans=0.125 2024-08-15 10:18:49,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3133270.0, ans=0.125 2024-08-15 10:18:56,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3133370.0, ans=0.2 2024-08-15 10:18:57,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.80 vs. limit=22.5 2024-08-15 10:18:59,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3133370.0, ans=0.125 2024-08-15 10:19:09,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3133470.0, ans=0.07 2024-08-15 10:19:45,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3133670.0, ans=0.125 2024-08-15 10:19:48,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.223e+01 2.558e+01 2.882e+01 3.996e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-15 10:19:48,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9050, loss[loss=0.09856, beats_loss=0.01037, ecapa_loss=0.000151, whisper_loss=0.08668, over 20488.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001494, whisper_loss=0.09059, over 3898962.82 frames. ], batch size: 80, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:19:51,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3133770.0, ans=0.0 2024-08-15 10:19:58,257 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 10:20:02,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3133870.0, ans=0.2 2024-08-15 10:20:15,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3133970.0, ans=0.125 2024-08-15 10:20:17,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3133970.0, ans=0.5 2024-08-15 10:20:20,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3133970.0, ans=0.2 2024-08-15 10:20:26,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3133970.0, ans=0.125 2024-08-15 10:20:34,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-08-15 10:20:44,316 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 10:20:50,023 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 10:20:57,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9100, loss[loss=0.1161, beats_loss=0.008443, ecapa_loss=0.0001699, whisper_loss=0.106, over 17192.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001497, whisper_loss=0.09081, over 3890593.09 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:21:00,836 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 10:21:02,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134270.0, ans=0.125 2024-08-15 10:21:15,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3134370.0, ans=0.125 2024-08-15 10:21:20,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3134370.0, ans=0.1 2024-08-15 10:21:26,962 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 10:21:43,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3134570.0, ans=0.0 2024-08-15 10:22:01,792 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 10:22:08,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.360e+01 2.655e+01 2.981e+01 2.632e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-15 10:22:08,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9150, loss[loss=0.1063, beats_loss=0.009438, ecapa_loss=0.0001898, whisper_loss=0.09498, over 19147.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001502, whisper_loss=0.09086, over 3919869.89 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:22:28,323 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 10:22:29,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3134870.0, ans=0.125 2024-08-15 10:22:40,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3134970.0, ans=0.05 2024-08-15 10:22:41,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3134970.0, ans=0.1 2024-08-15 10:22:42,658 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 10:22:50,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3135070.0, ans=0.125 2024-08-15 10:22:51,463 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 10:22:59,811 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 10:23:14,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3135170.0, ans=0.125 2024-08-15 10:23:20,289 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 10:23:23,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9200, loss[loss=0.1134, beats_loss=0.006426, ecapa_loss=0.0001958, whisper_loss=0.105, over 14546.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01065, ecapa_loss=0.0001505, whisper_loss=0.09123, over 3905380.54 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:23:32,998 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 16 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 10:24:16,119 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 10:24:21,603 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-15 10:24:28,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2024-08-15 10:24:32,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2024-08-15 10:24:36,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3135670.0, ans=0.125 2024-08-15 10:24:46,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.313e+01 2.628e+01 2.847e+01 1.469e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-15 10:24:46,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9250, loss[loss=0.1194, beats_loss=0.00756, ecapa_loss=0.0001959, whisper_loss=0.1098, over 22487.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001504, whisper_loss=0.091, over 3883707.41 frames. ], batch size: 91, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:24:47,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3135770.0, ans=0.125 2024-08-15 10:25:10,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3135870.0, ans=0.125 2024-08-15 10:25:35,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3135970.0, ans=0.125 2024-08-15 10:26:00,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3136170.0, ans=15.0 2024-08-15 10:26:02,198 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 10:26:04,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3136170.0, ans=0.125 2024-08-15 10:26:05,649 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 10:26:12,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3136270.0, ans=0.0 2024-08-15 10:26:13,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-08-15 10:26:13,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9300, loss[loss=0.08301, beats_loss=0.0129, ecapa_loss=0.0001907, whisper_loss=0.0682, over 14187.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001511, whisper_loss=0.09069, over 3885924.10 frames. ], batch size: 59, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:26:25,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3136270.0, ans=0.1 2024-08-15 10:26:59,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-15 10:27:00,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3136570.0, ans=0.2 2024-08-15 10:27:01,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3136570.0, ans=0.125 2024-08-15 10:27:15,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3136670.0, ans=0.125 2024-08-15 10:27:19,640 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 10:27:26,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.363e+01 2.589e+01 2.922e+01 5.036e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-15 10:27:26,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9350, loss[loss=0.124, beats_loss=0.00936, ecapa_loss=0.0001587, whisper_loss=0.113, over 19407.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001509, whisper_loss=0.09088, over 3903108.74 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:27:52,065 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-15 10:28:07,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 10:28:11,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137070.0, ans=0.125 2024-08-15 10:28:15,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-15 10:28:18,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3137070.0, ans=0.0 2024-08-15 10:28:29,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3137170.0, ans=0.125 2024-08-15 10:28:35,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=12.0 2024-08-15 10:28:35,961 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9400, loss[loss=0.1166, beats_loss=0.009965, ecapa_loss=0.0001495, whisper_loss=0.1051, over 23119.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001511, whisper_loss=0.09057, over 3913860.55 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:28:36,177 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-15 10:28:43,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3137270.0, ans=0.2 2024-08-15 10:28:52,929 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 10:28:57,146 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 10:29:39,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-08-15 10:29:45,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.347e+01 2.545e+01 2.871e+01 4.993e+01, threshold=5.089e+01, percent-clipped=0.0 2024-08-15 10:29:45,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9450, loss[loss=0.1106, beats_loss=0.009664, ecapa_loss=0.0001767, whisper_loss=0.09913, over 22097.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001508, whisper_loss=0.09069, over 3917247.58 frames. ], batch size: 92, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:29:48,397 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-15 10:30:06,518 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-15 10:30:22,833 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 10:30:27,105 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 10:30:43,140 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 10:30:54,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9500, loss[loss=0.08805, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.0762, over 15434.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01053, ecapa_loss=0.0001515, whisper_loss=0.09083, over 3913562.63 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:30:56,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=3138270.0, ans=12.0 2024-08-15 10:31:10,253 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 10:31:21,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3138470.0, ans=0.0 2024-08-15 10:31:47,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3138570.0, ans=0.125 2024-08-15 10:32:03,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.464e+01 2.657e+01 3.059e+01 1.936e+02, threshold=5.313e+01, percent-clipped=3.0 2024-08-15 10:32:03,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9550, loss[loss=0.06516, beats_loss=0.0144, ecapa_loss=0.0001236, whisper_loss=0.04953, over 15651.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001517, whisper_loss=0.09115, over 3903890.96 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:32:32,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3138970.0, ans=0.1 2024-08-15 10:32:42,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3138970.0, ans=0.125 2024-08-15 10:32:57,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3139070.0, ans=0.125 2024-08-15 10:33:07,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3139170.0, ans=0.125 2024-08-15 10:33:12,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2024-08-15 10:33:15,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9600, loss[loss=0.08688, beats_loss=0.01143, ecapa_loss=0.0001725, whisper_loss=0.07373, over 19783.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001521, whisper_loss=0.09089, over 3882377.35 frames. ], batch size: 83, lr: 2.82e-03, grad_scale: 1.152921504606847e+18 2024-08-15 10:33:17,856 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.382e+00 2024-08-15 10:33:25,083 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 10:33:26,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3139270.0, ans=0.125 2024-08-15 10:33:29,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3139370.0, ans=0.1 2024-08-15 10:33:46,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3139470.0, ans=0.125 2024-08-15 10:33:48,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3139470.0, ans=0.125 2024-08-15 10:34:01,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3139570.0, ans=0.125 2024-08-15 10:34:26,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9650, loss[loss=0.1245, beats_loss=0.00979, ecapa_loss=0.0001403, whisper_loss=0.1133, over 23429.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.000152, whisper_loss=0.09082, over 3860307.93 frames. ], batch size: 91, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:34:27,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3139770.0, ans=0.2 2024-08-15 10:34:27,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.224e+01 2.493e+01 2.795e+01 4.633e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-15 10:34:42,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3139870.0, ans=0.125 2024-08-15 10:34:43,409 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 10:35:00,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3139970.0, ans=0.125 2024-08-15 10:35:04,264 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 10:35:14,509 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 10:35:20,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3140070.0, ans=0.125 2024-08-15 10:35:24,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3140170.0, ans=0.2 2024-08-15 10:35:36,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9700, loss[loss=0.101, beats_loss=0.01004, ecapa_loss=0.0001683, whisper_loss=0.08928, over 21909.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.000152, whisper_loss=0.09043, over 3871890.43 frames. ], batch size: 93, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:35:41,710 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-15 10:36:05,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-15 10:36:09,155 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 10:36:12,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3140470.0, ans=0.5 2024-08-15 10:36:26,021 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 10:36:38,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3140670.0, ans=0.0 2024-08-15 10:36:43,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-15 10:36:45,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9750, loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.0001564, whisper_loss=0.09162, over 20012.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001513, whisper_loss=0.09025, over 3886861.53 frames. ], batch size: 82, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:36:46,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.354e+01 2.591e+01 2.841e+01 9.647e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-15 10:36:58,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-08-15 10:37:04,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-15 10:37:05,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-08-15 10:37:13,134 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 10:37:18,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-15 10:37:18,854 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-15 10:37:26,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3141070.0, ans=0.0 2024-08-15 10:37:28,881 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 10:37:36,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3141070.0, ans=0.0 2024-08-15 10:37:44,767 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 12 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 10:37:49,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3141170.0, ans=0.125 2024-08-15 10:37:53,478 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 10:37:55,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9800, loss[loss=0.09475, beats_loss=0.01106, ecapa_loss=0.0001527, whisper_loss=0.08216, over 21682.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001514, whisper_loss=0.0896, over 3840205.47 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:38:07,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141270.0, ans=0.1 2024-08-15 10:38:09,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3141370.0, ans=0.0 2024-08-15 10:38:09,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141370.0, ans=0.125 2024-08-15 10:38:53,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2024-08-15 10:39:05,426 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9850, loss[loss=0.1119, beats_loss=0.009972, ecapa_loss=0.0001371, whisper_loss=0.1006, over 17689.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001521, whisper_loss=0.09029, over 3863342.78 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:39:06,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.300e+01 2.506e+01 2.923e+01 9.908e+01, threshold=5.012e+01, percent-clipped=1.0 2024-08-15 10:39:09,707 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 10:39:15,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3141770.0, ans=0.125 2024-08-15 10:39:18,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3141870.0, ans=0.07 2024-08-15 10:39:26,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2024-08-15 10:39:29,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3141870.0, ans=0.2 2024-08-15 10:39:37,161 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 10:39:37,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3141970.0, ans=10.0 2024-08-15 10:39:48,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3142070.0, ans=0.0 2024-08-15 10:39:49,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3142070.0, ans=0.125 2024-08-15 10:39:52,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3142070.0, ans=0.07 2024-08-15 10:40:04,046 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 10:40:13,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9900, loss[loss=0.1099, beats_loss=0.009744, ecapa_loss=0.0001707, whisper_loss=0.0984, over 18599.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001507, whisper_loss=0.09085, over 3901100.11 frames. ], batch size: 74, lr: 2.82e-03, grad_scale: 5.764607523034235e+17 2024-08-15 10:40:18,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3142270.0, ans=0.125 2024-08-15 10:40:19,596 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 10:40:22,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3142270.0, ans=0.125 2024-08-15 10:40:28,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3142370.0, ans=0.0 2024-08-15 10:40:54,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3142570.0, ans=0.05 2024-08-15 10:40:55,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3142570.0, ans=0.125 2024-08-15 10:40:58,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-15 10:41:00,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3142570.0, ans=0.2 2024-08-15 10:41:13,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142670.0, ans=0.1 2024-08-15 10:41:14,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142670.0, ans=0.1 2024-08-15 10:41:18,282 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-15 10:41:20,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3142670.0, ans=0.0 2024-08-15 10:41:22,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 9950, loss[loss=0.1139, beats_loss=0.01034, ecapa_loss=0.0001531, whisper_loss=0.102, over 18057.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.000151, whisper_loss=0.09057, over 3869153.95 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:41:24,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.640e+01 2.920e+01 4.147e+01, threshold=5.279e+01, percent-clipped=0.0 2024-08-15 10:41:36,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142870.0, ans=0.125 2024-08-15 10:41:47,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3142870.0, ans=0.125 2024-08-15 10:41:49,363 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 10:41:56,885 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-15 10:42:09,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3143070.0, ans=0.0 2024-08-15 10:42:18,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3143170.0, ans=0.125 2024-08-15 10:42:21,619 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 10:42:23,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-15 10:42:24,345 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 31 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-15 10:42:27,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143170.0, ans=0.1 2024-08-15 10:42:31,491 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 10:42:32,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3143270.0, ans=0.125 2024-08-15 10:42:32,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.19 vs. limit=22.5 2024-08-15 10:42:32,805 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10000, loss[loss=0.0821, beats_loss=0.01265, ecapa_loss=0.0001268, whisper_loss=0.06818, over 16983.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001507, whisper_loss=0.09069, over 3862494.87 frames. ], batch size: 68, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:43:12,087 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-15 10:43:43,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3143670.0, ans=0.2 2024-08-15 10:43:43,998 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-15 10:43:44,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3143670.0, ans=0.125 2024-08-15 10:43:49,169 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 10:43:50,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10050, loss[loss=0.1013, beats_loss=0.01001, ecapa_loss=0.0001755, whisper_loss=0.08953, over 21734.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001514, whisper_loss=0.09088, over 3877286.68 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:43:53,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.380e+01 2.609e+01 2.956e+01 1.893e+02, threshold=5.219e+01, percent-clipped=1.0 2024-08-15 10:43:57,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3143770.0, ans=0.125 2024-08-15 10:44:04,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3143770.0, ans=0.2 2024-08-15 10:44:06,481 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 35 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 10:44:11,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3143870.0, ans=0.0 2024-08-15 10:44:17,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3143870.0, ans=0.025 2024-08-15 10:44:27,026 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 10:44:39,190 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 10:44:39,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3143970.0, ans=0.0 2024-08-15 10:44:44,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3144070.0, ans=0.125 2024-08-15 10:44:53,687 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-15 10:45:22,313 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 10:45:28,050 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10100, loss[loss=0.1045, beats_loss=0.009718, ecapa_loss=0.0002259, whisper_loss=0.09254, over 22149.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001508, whisper_loss=0.09124, over 3909908.11 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:45:28,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3144270.0, ans=0.05 2024-08-15 10:45:33,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3144270.0, ans=0.2 2024-08-15 10:45:43,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3144270.0, ans=0.125 2024-08-15 10:45:44,930 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 10:46:12,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3144470.0, ans=0.2 2024-08-15 10:46:16,585 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-15 10:46:17,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3144470.0, ans=0.1 2024-08-15 10:46:30,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3144470.0, ans=0.0 2024-08-15 10:46:49,707 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 10:46:52,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3144570.0, ans=0.125 2024-08-15 10:47:24,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10150, loss[loss=0.1163, beats_loss=0.009573, ecapa_loss=0.0001679, whisper_loss=0.1051, over 19326.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01065, ecapa_loss=0.000151, whisper_loss=0.09096, over 3898254.03 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:47:29,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.324e+01 2.588e+01 2.924e+01 3.968e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-15 10:47:36,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3144770.0, ans=0.2 2024-08-15 10:47:47,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3144770.0, ans=0.2 2024-08-15 10:48:49,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2024-08-15 10:49:06,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10200, loss[loss=0.09166, beats_loss=0.01335, ecapa_loss=0.0001285, whisper_loss=0.07702, over 18932.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01071, ecapa_loss=0.0001505, whisper_loss=0.09003, over 3892065.13 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:49:15,652 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-15 10:49:15,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3145270.0, ans=0.125 2024-08-15 10:49:20,420 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 10:49:23,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3145370.0, ans=0.125 2024-08-15 10:49:25,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3145370.0, ans=0.1 2024-08-15 10:49:43,599 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 10:49:50,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3145470.0, ans=0.5 2024-08-15 10:49:51,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3145570.0, ans=0.0 2024-08-15 10:50:03,934 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 10:50:23,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10250, loss[loss=0.1061, beats_loss=0.009874, ecapa_loss=0.0001468, whisper_loss=0.09481, over 22208.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001508, whisper_loss=0.0901, over 3889821.76 frames. ], batch size: 87, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:50:26,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.258e+01 2.433e+01 2.798e+01 3.625e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 10:50:31,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3145770.0, ans=0.07 2024-08-15 10:50:32,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3145770.0, ans=0.125 2024-08-15 10:50:38,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3145870.0, ans=0.0 2024-08-15 10:50:57,939 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 10:51:00,368 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 10:51:14,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-15 10:51:31,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3146170.0, ans=0.0 2024-08-15 10:51:42,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10300, loss[loss=0.1079, beats_loss=0.009809, ecapa_loss=0.0001299, whisper_loss=0.09678, over 22175.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001504, whisper_loss=0.09017, over 3905732.56 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:51:50,356 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 10:52:07,733 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 10:52:42,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3146570.0, ans=0.07 2024-08-15 10:52:45,962 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 10:52:55,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.01 vs. limit=15.0 2024-08-15 10:52:56,892 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 10:53:05,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10350, loss[loss=0.08047, beats_loss=0.01153, ecapa_loss=0.0001318, whisper_loss=0.06763, over 17451.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001499, whisper_loss=0.09108, over 3917042.18 frames. ], batch size: 71, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:53:08,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.405e+01 2.644e+01 3.063e+01 2.497e+02, threshold=5.287e+01, percent-clipped=1.0 2024-08-15 10:53:09,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-15 10:53:27,072 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 10:53:37,846 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 10:53:46,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3146970.0, ans=0.125 2024-08-15 10:53:47,157 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 10:53:52,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.27 vs. limit=22.5 2024-08-15 10:54:00,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3147070.0, ans=0.125 2024-08-15 10:54:02,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-15 10:54:04,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3147070.0, ans=0.0 2024-08-15 10:54:25,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2024-08-15 10:54:28,428 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10400, loss[loss=0.1032, beats_loss=0.008884, ecapa_loss=0.0001868, whisper_loss=0.09246, over 17292.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001503, whisper_loss=0.09165, over 3904982.24 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:54:39,172 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 10:54:45,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3147370.0, ans=0.125 2024-08-15 10:54:54,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3147370.0, ans=0.125 2024-08-15 10:55:16,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3147470.0, ans=0.125 2024-08-15 10:55:22,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2024-08-15 10:55:23,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3147570.0, ans=0.1 2024-08-15 10:55:33,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3147570.0, ans=0.125 2024-08-15 10:55:34,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3147570.0, ans=0.125 2024-08-15 10:55:52,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10450, loss[loss=0.09587, beats_loss=0.01012, ecapa_loss=0.0001328, whisper_loss=0.08442, over 18195.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001507, whisper_loss=0.09124, over 3884263.29 frames. ], batch size: 69, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:55:55,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.272e+01 2.480e+01 2.758e+01 4.514e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-15 10:55:58,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-15 10:56:12,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3147870.0, ans=0.0 2024-08-15 10:56:19,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3147870.0, ans=0.0 2024-08-15 10:56:24,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-15 10:56:26,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3147970.0, ans=0.125 2024-08-15 10:56:36,021 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 10:56:42,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3148070.0, ans=0.125 2024-08-15 10:56:47,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3148070.0, ans=0.0 2024-08-15 10:56:53,211 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 10:57:08,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10500, loss[loss=0.1067, beats_loss=0.007592, ecapa_loss=0.0001625, whisper_loss=0.09751, over 14180.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001507, whisper_loss=0.09135, over 3891057.45 frames. ], batch size: 55, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:57:30,489 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 10:57:32,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3148370.0, ans=0.05 2024-08-15 10:57:33,259 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 10:57:41,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3148470.0, ans=0.0 2024-08-15 10:58:06,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-08-15 10:58:25,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3148670.0, ans=0.0 2024-08-15 10:58:31,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10550, loss[loss=0.07877, beats_loss=0.01418, ecapa_loss=0.0001468, whisper_loss=0.06312, over 16880.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001509, whisper_loss=0.09079, over 3896951.61 frames. ], batch size: 72, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:58:34,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.372e+01 2.650e+01 2.883e+01 3.926e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 10:58:43,788 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 10:59:20,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3149070.0, ans=0.2 2024-08-15 10:59:37,574 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 10:59:39,499 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 10:59:49,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10600, loss[loss=0.09119, beats_loss=0.01235, ecapa_loss=0.0001136, whisper_loss=0.0777, over 22440.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001512, whisper_loss=0.09137, over 3907966.98 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 10:59:52,006 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 10:59:52,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3149270.0, ans=0.125 2024-08-15 10:59:52,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3149270.0, ans=22.5 2024-08-15 10:59:59,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3149270.0, ans=0.0 2024-08-15 11:00:07,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3149370.0, ans=0.125 2024-08-15 11:00:08,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3149370.0, ans=0.125 2024-08-15 11:00:09,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3149370.0, ans=0.2 2024-08-15 11:00:14,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-08-15 11:00:16,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3149370.0, ans=0.125 2024-08-15 11:00:52,884 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 11:00:54,010 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-15 11:00:57,306 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 11:01:06,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10650, loss[loss=0.08784, beats_loss=0.01063, ecapa_loss=0.0001126, whisper_loss=0.07608, over 14345.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01041, ecapa_loss=0.0001515, whisper_loss=0.09205, over 3902740.74 frames. ], batch size: 54, lr: 2.82e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:01:09,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.413e+01 2.629e+01 2.898e+01 3.897e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-15 11:01:14,077 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 11:01:15,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-08-15 11:01:28,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3149870.0, ans=0.0 2024-08-15 11:01:34,546 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 11:01:38,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3149870.0, ans=0.015 2024-08-15 11:01:56,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3150070.0, ans=0.0 2024-08-15 11:02:09,583 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 11:02:23,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3150170.0, ans=0.125 2024-08-15 11:02:26,719 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-15 11:02:30,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10700, loss[loss=0.09797, beats_loss=0.009093, ecapa_loss=0.0001726, whisper_loss=0.08716, over 17814.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01044, ecapa_loss=0.000152, whisper_loss=0.09214, over 3902618.68 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:02:32,317 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 11:02:49,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150370.0, ans=0.1 2024-08-15 11:03:08,832 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 11:03:26,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3150570.0, ans=0.125 2024-08-15 11:03:29,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3150670.0, ans=0.0 2024-08-15 11:03:40,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2024-08-15 11:03:44,582 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10750, loss[loss=0.09642, beats_loss=0.01276, ecapa_loss=0.0001289, whisper_loss=0.08237, over 21688.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.09169, over 3918677.87 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:03:47,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.262e+01 2.469e+01 2.772e+01 4.273e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-15 11:03:50,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3150770.0, ans=0.125 2024-08-15 11:03:56,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3150770.0, ans=0.125 2024-08-15 11:04:11,212 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-15 11:04:25,860 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 11:04:30,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3151070.0, ans=0.05 2024-08-15 11:04:31,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3151070.0, ans=0.2 2024-08-15 11:04:44,467 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:04:44,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3151170.0, ans=0.125 2024-08-15 11:04:58,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10800, loss[loss=0.1157, beats_loss=0.009765, ecapa_loss=0.0001326, whisper_loss=0.1046, over 19098.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.00015, whisper_loss=0.09132, over 3888458.08 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:04:58,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3151270.0, ans=0.1 2024-08-15 11:05:02,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2024-08-15 11:05:24,489 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 11:05:29,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3151470.0, ans=10.0 2024-08-15 11:05:34,638 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 11:05:36,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-15 11:05:55,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3151570.0, ans=0.1 2024-08-15 11:06:07,723 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 11:06:14,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3151670.0, ans=0.1 2024-08-15 11:06:22,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10850, loss[loss=0.1184, beats_loss=0.01082, ecapa_loss=0.0001407, whisper_loss=0.1062, over 21366.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01057, ecapa_loss=0.0001499, whisper_loss=0.09175, over 3886059.56 frames. ], batch size: 84, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:06:26,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.371e+01 2.566e+01 2.885e+01 4.578e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-15 11:06:32,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3151770.0, ans=0.125 2024-08-15 11:06:33,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3151770.0, ans=0.125 2024-08-15 11:06:56,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2024-08-15 11:07:07,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3151970.0, ans=0.2 2024-08-15 11:07:16,087 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:07:18,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3152070.0, ans=0.125 2024-08-15 11:07:44,718 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10900, loss[loss=0.1033, beats_loss=0.01022, ecapa_loss=0.0001774, whisper_loss=0.09134, over 17421.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001497, whisper_loss=0.09084, over 3892625.77 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:07:54,464 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-15 11:08:10,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3152370.0, ans=0.07 2024-08-15 11:08:25,163 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 11:08:27,728 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.887e+00 2024-08-15 11:09:09,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 10950, loss[loss=0.09533, beats_loss=0.01374, ecapa_loss=0.0001203, whisper_loss=0.08038, over 22520.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001505, whisper_loss=0.09097, over 3881166.58 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:09:09,712 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 11:09:12,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.384e+01 2.656e+01 2.933e+01 4.855e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-15 11:09:21,928 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 11:09:33,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3152870.0, ans=0.125 2024-08-15 11:09:36,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3152870.0, ans=0.0 2024-08-15 11:09:41,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3152970.0, ans=0.0 2024-08-15 11:09:44,515 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:10:06,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-08-15 11:10:10,787 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 11:10:11,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3153170.0, ans=0.125 2024-08-15 11:10:18,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3153170.0, ans=0.1 2024-08-15 11:10:20,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3153170.0, ans=0.125 2024-08-15 11:10:25,262 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11000, loss[loss=0.1244, beats_loss=0.009284, ecapa_loss=0.0001754, whisper_loss=0.1133, over 21482.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001504, whisper_loss=0.09095, over 3880557.19 frames. ], batch size: 88, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:10:34,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3153270.0, ans=0.0 2024-08-15 11:10:39,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3153370.0, ans=0.125 2024-08-15 11:10:40,382 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 11:10:41,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3153370.0, ans=0.0 2024-08-15 11:10:42,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3153370.0, ans=0.125 2024-08-15 11:10:51,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3153370.0, ans=0.2 2024-08-15 11:11:04,401 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 11:11:09,049 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 11:11:15,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3153570.0, ans=0.125 2024-08-15 11:11:16,166 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 11:11:26,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3153670.0, ans=0.0 2024-08-15 11:11:38,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11050, loss[loss=0.09271, beats_loss=0.01036, ecapa_loss=0.0001466, whisper_loss=0.08088, over 16581.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001511, whisper_loss=0.0915, over 3899160.85 frames. ], batch size: 63, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:11:40,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3153770.0, ans=15.0 2024-08-15 11:11:41,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.292e+01 2.575e+01 2.942e+01 2.806e+02, threshold=5.150e+01, percent-clipped=2.0 2024-08-15 11:11:52,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3153870.0, ans=0.125 2024-08-15 11:11:55,443 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 11:11:57,122 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-15 11:12:24,248 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 11:12:36,528 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 11:12:45,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2024-08-15 11:12:48,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3154170.0, ans=0.1 2024-08-15 11:12:49,399 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 11:13:00,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11100, loss[loss=0.1156, beats_loss=0.008962, ecapa_loss=0.0002038, whisper_loss=0.1046, over 17832.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001512, whisper_loss=0.0918, over 3892519.57 frames. ], batch size: 72, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:13:03,605 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 11:13:22,756 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 11:13:23,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3154370.0, ans=0.125 2024-08-15 11:13:24,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3154370.0, ans=0.0 2024-08-15 11:13:39,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3154470.0, ans=0.125 2024-08-15 11:14:14,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3154670.0, ans=0.125 2024-08-15 11:14:16,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11150, loss[loss=0.08049, beats_loss=0.009171, ecapa_loss=0.0001758, whisper_loss=0.06956, over 16750.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001509, whisper_loss=0.09122, over 3876458.05 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:14:19,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.361e+01 2.547e+01 2.785e+01 4.285e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-15 11:14:21,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-08-15 11:14:37,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-15 11:14:46,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3154970.0, ans=0.125 2024-08-15 11:14:54,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3154970.0, ans=0.125 2024-08-15 11:15:25,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-15 11:15:31,202 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11200, loss[loss=0.1038, beats_loss=0.009866, ecapa_loss=0.0001521, whisper_loss=0.09243, over 16447.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001515, whisper_loss=0.09167, over 3847798.65 frames. ], batch size: 64, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:15:33,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3155270.0, ans=0.0 2024-08-15 11:15:45,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3155370.0, ans=0.0 2024-08-15 11:16:37,368 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:16:43,983 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11250, loss[loss=0.1146, beats_loss=0.009925, ecapa_loss=0.0001593, whisper_loss=0.1031, over 19749.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.000151, whisper_loss=0.09174, over 3882316.96 frames. ], batch size: 79, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:16:46,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3155770.0, ans=0.0 2024-08-15 11:16:46,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.380e+01 2.622e+01 3.019e+01 1.107e+02, threshold=5.243e+01, percent-clipped=1.0 2024-08-15 11:16:47,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3155770.0, ans=0.0 2024-08-15 11:16:52,979 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 11:17:06,864 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 11:17:19,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-15 11:17:23,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-15 11:17:34,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3156070.0, ans=0.0 2024-08-15 11:17:48,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3156170.0, ans=0.125 2024-08-15 11:17:55,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3156170.0, ans=0.0 2024-08-15 11:18:00,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11300, loss[loss=0.1125, beats_loss=0.01093, ecapa_loss=0.0001409, whisper_loss=0.1002, over 17527.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001494, whisper_loss=0.09108, over 3888270.69 frames. ], batch size: 68, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:18:02,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3156270.0, ans=0.125 2024-08-15 11:18:09,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-08-15 11:18:35,301 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 11:18:46,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3156470.0, ans=0.0 2024-08-15 11:18:55,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=12.0 2024-08-15 11:18:55,745 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 40 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 11:19:09,644 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 11:19:26,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11350, loss[loss=0.08773, beats_loss=0.01025, ecapa_loss=0.0001748, whisper_loss=0.07573, over 13719.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001496, whisper_loss=0.09156, over 3903298.83 frames. ], batch size: 56, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:19:29,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.374e+01 2.563e+01 2.940e+01 7.855e+01, threshold=5.126e+01, percent-clipped=1.0 2024-08-15 11:19:49,757 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 11:19:53,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3156870.0, ans=0.0 2024-08-15 11:19:54,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3156970.0, ans=0.125 2024-08-15 11:20:06,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3156970.0, ans=0.125 2024-08-15 11:20:08,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3156970.0, ans=0.0 2024-08-15 11:20:09,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3157070.0, ans=0.125 2024-08-15 11:20:16,866 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 11:20:25,827 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 11:20:31,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-15 11:20:31,930 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 11:20:40,168 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11400, loss[loss=0.1, beats_loss=0.01253, ecapa_loss=0.0001556, whisper_loss=0.08593, over 21798.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001499, whisper_loss=0.09147, over 3892889.28 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:20:53,313 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 11:21:14,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157470.0, ans=0.1 2024-08-15 11:21:14,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3157470.0, ans=0.1 2024-08-15 11:21:27,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3157570.0, ans=0.125 2024-08-15 11:21:30,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3157570.0, ans=0.125 2024-08-15 11:21:39,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3157570.0, ans=0.2 2024-08-15 11:21:52,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3157670.0, ans=0.2 2024-08-15 11:21:53,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-15 11:22:01,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11450, loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.000179, whisper_loss=0.09052, over 22697.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.09197, over 3918328.25 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:22:04,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.396e+01 2.613e+01 2.879e+01 7.410e+02, threshold=5.227e+01, percent-clipped=0.0 2024-08-15 11:22:04,386 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07053599506616592, model_norm_threshold=52.26521682739258 2024-08-15 11:22:04,560 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.421e+04, grad_sumsq=9.420e+04, orig_rms_sq=5.754e-01 2024-08-15 11:22:04,861 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-15 11:22:08,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3157770.0, ans=0.02 2024-08-15 11:22:23,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3157870.0, ans=0.0 2024-08-15 11:22:33,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3157870.0, ans=0.125 2024-08-15 11:22:36,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:22:54,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3158070.0, ans=0.0 2024-08-15 11:22:59,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-15 11:23:08,434 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 11:23:22,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3158270.0, ans=0.0 2024-08-15 11:23:23,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11500, loss[loss=0.07951, beats_loss=0.01014, ecapa_loss=0.0001898, whisper_loss=0.06747, over 15729.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001496, whisper_loss=0.09134, over 3881770.07 frames. ], batch size: 66, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:23:33,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3158270.0, ans=0.05 2024-08-15 11:23:46,456 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 11:23:49,745 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 11:24:02,191 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 27 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-15 11:24:19,100 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 11:24:40,606 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11550, loss[loss=0.1137, beats_loss=0.01004, ecapa_loss=0.0001477, whisper_loss=0.1022, over 22950.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01048, ecapa_loss=0.0001506, whisper_loss=0.09195, over 3876098.05 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:24:44,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.411e+01 2.579e+01 2.880e+01 5.127e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-15 11:24:55,310 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 11:25:35,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3159070.0, ans=0.0 2024-08-15 11:25:40,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-15 11:25:44,166 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 11:25:46,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2024-08-15 11:25:54,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3159170.0, ans=0.05 2024-08-15 11:25:57,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11600, loss[loss=0.09568, beats_loss=0.01204, ecapa_loss=0.0001456, whisper_loss=0.08218, over 22045.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001513, whisper_loss=0.0916, over 3875606.16 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:26:23,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-08-15 11:26:24,456 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 11:26:48,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3159570.0, ans=0.0 2024-08-15 11:27:00,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3159670.0, ans=0.125 2024-08-15 11:27:13,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-15 11:27:16,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11650, loss[loss=0.1006, beats_loss=0.0106, ecapa_loss=0.0001367, whisper_loss=0.08859, over 17979.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001521, whisper_loss=0.09167, over 3896785.05 frames. ], batch size: 71, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:27:19,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.447e+01 2.693e+01 2.991e+01 1.020e+02, threshold=5.386e+01, percent-clipped=2.0 2024-08-15 11:27:29,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3159770.0, ans=0.125 2024-08-15 11:27:29,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3159770.0, ans=0.125 2024-08-15 11:27:40,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=12.0 2024-08-15 11:27:47,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3159970.0, ans=0.125 2024-08-15 11:28:01,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159970.0, ans=0.1 2024-08-15 11:28:05,244 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 11:28:23,321 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 11:28:29,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3160170.0, ans=0.2 2024-08-15 11:28:30,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3160170.0, ans=0.05 2024-08-15 11:28:34,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11700, loss[loss=0.1206, beats_loss=0.008821, ecapa_loss=0.0001233, whisper_loss=0.1105, over 22274.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001516, whisper_loss=0.09188, over 3918091.45 frames. ], batch size: 80, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:28:51,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-15 11:28:57,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3160370.0, ans=0.1 2024-08-15 11:29:06,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3160470.0, ans=0.0 2024-08-15 11:29:12,261 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-15 11:29:31,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2024-08-15 11:29:48,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11750, loss[loss=0.09816, beats_loss=0.009653, ecapa_loss=0.00016, whisper_loss=0.08691, over 17025.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01069, ecapa_loss=0.0001514, whisper_loss=0.09135, over 3948954.91 frames. ], batch size: 69, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:29:52,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.446e+01 2.685e+01 3.012e+01 3.635e+02, threshold=5.370e+01, percent-clipped=2.0 2024-08-15 11:29:54,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3160770.0, ans=0.125 2024-08-15 11:29:55,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3160770.0, ans=0.125 2024-08-15 11:30:10,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3160870.0, ans=0.125 2024-08-15 11:30:21,710 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 11:30:24,563 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 11:30:30,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3160970.0, ans=0.125 2024-08-15 11:30:32,141 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-15 11:30:32,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3161070.0, ans=0.1 2024-08-15 11:30:40,413 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.625e+00 2024-08-15 11:30:52,893 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 11:30:53,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161170.0, ans=0.1 2024-08-15 11:31:03,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11800, loss[loss=0.1032, beats_loss=0.008639, ecapa_loss=0.0001892, whisper_loss=0.09268, over 18306.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01065, ecapa_loss=0.0001522, whisper_loss=0.09152, over 3940426.13 frames. ], batch size: 75, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:31:06,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161270.0, ans=0.1 2024-08-15 11:31:13,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3161270.0, ans=0.125 2024-08-15 11:31:15,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3161270.0, ans=0.5 2024-08-15 11:31:19,016 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-15 11:31:28,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-15 11:31:42,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3161470.0, ans=0.04949747468305833 2024-08-15 11:31:51,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161570.0, ans=0.1 2024-08-15 11:31:55,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3161570.0, ans=0.0 2024-08-15 11:32:00,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3161670.0, ans=0.125 2024-08-15 11:32:15,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11850, loss[loss=0.106, beats_loss=0.01077, ecapa_loss=0.0001353, whisper_loss=0.09391, over 22206.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0107, ecapa_loss=0.0001515, whisper_loss=0.09139, over 3927045.43 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:32:17,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.442e+01 2.720e+01 2.983e+01 2.168e+02, threshold=5.440e+01, percent-clipped=2.0 2024-08-15 11:32:25,406 WARNING [optim.py:496] (3/4) Scaling gradients by 0.029826095327734947, model_norm_threshold=54.40060806274414 2024-08-15 11:32:25,592 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.872e+05, grad_sumsq=7.654e+04, orig_rms_sq=8.977e+00 2024-08-15 11:32:26,091 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 11:32:47,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-08-15 11:33:06,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-15 11:33:13,104 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 11:33:15,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3162170.0, ans=0.0 2024-08-15 11:33:17,567 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 11:33:25,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-15 11:33:29,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11900, loss[loss=0.09939, beats_loss=0.01102, ecapa_loss=0.0001456, whisper_loss=0.08691, over 14019.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0108, ecapa_loss=0.0001512, whisper_loss=0.09035, over 3959062.03 frames. ], batch size: 57, lr: 2.81e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 11:33:30,779 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 11:33:39,903 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 11:33:42,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.33 vs. limit=12.0 2024-08-15 11:33:48,693 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 11:33:57,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3162370.0, ans=0.2 2024-08-15 11:34:35,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3162670.0, ans=0.125 2024-08-15 11:34:43,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 11950, loss[loss=0.09328, beats_loss=0.01357, ecapa_loss=0.0001724, whisper_loss=0.07799, over 21478.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01068, ecapa_loss=0.0001521, whisper_loss=0.09044, over 3910349.35 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:34:44,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3162770.0, ans=0.035 2024-08-15 11:34:46,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.257e+01 2.477e+01 2.736e+01 1.824e+03, threshold=4.954e+01, percent-clipped=1.0 2024-08-15 11:34:51,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3162770.0, ans=0.1 2024-08-15 11:34:57,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3162870.0, ans=0.2 2024-08-15 11:35:03,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3162870.0, ans=0.125 2024-08-15 11:35:12,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-08-15 11:35:19,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3162970.0, ans=0.125 2024-08-15 11:35:26,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3163070.0, ans=0.2 2024-08-15 11:35:29,386 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-15 11:35:57,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12000, loss[loss=0.09797, beats_loss=0.01151, ecapa_loss=0.0001443, whisper_loss=0.08502, over 21544.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001518, whisper_loss=0.09046, over 3893125.71 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:35:57,238 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 11:36:35,753 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005396, whisper_loss=0.2462, over 922467.00 frames. 2024-08-15 11:36:50,456 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6960, 4.5182, 3.7387, 3.9742], device='cuda:3') 2024-08-15 11:36:56,036 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on SV_voxceleb1: loss=0.004196, beats_loss=0, ecapa_loss=0.0004196, whisper_loss=0, over 939242.00 frames. 2024-08-15 11:38:51,465 INFO [train_multi_KD3.py:1149] (3/4) Epoch 22, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 11:38:51,469 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 11:39:14,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3163370.0, ans=0.125 2024-08-15 11:39:42,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-15 11:40:04,831 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12050, loss[loss=0.07277, beats_loss=0.01192, ecapa_loss=0.000161, whisper_loss=0.05924, over 14933.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.000153, whisper_loss=0.08975, over 3855938.70 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:40:05,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-15 11:40:07,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.427e+01 2.583e+01 3.021e+01 1.024e+02, threshold=5.165e+01, percent-clipped=2.0 2024-08-15 11:40:08,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3163770.0, ans=0.1 2024-08-15 11:40:09,822 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 11:40:10,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3163770.0, ans=0.125 2024-08-15 11:40:11,297 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 11:40:11,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3163770.0, ans=0.125 2024-08-15 11:40:23,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3163870.0, ans=0.0 2024-08-15 11:40:38,139 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 11:40:40,834 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 11:41:03,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3164170.0, ans=0.0 2024-08-15 11:41:08,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3164170.0, ans=0.2 2024-08-15 11:41:12,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3164170.0, ans=0.2 2024-08-15 11:41:14,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2024-08-15 11:41:18,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3164270.0, ans=0.0 2024-08-15 11:41:19,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12100, loss[loss=0.09906, beats_loss=0.01051, ecapa_loss=0.0001426, whisper_loss=0.08712, over 20555.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001529, whisper_loss=0.09009, over 3863824.97 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:41:22,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3164270.0, ans=0.0 2024-08-15 11:41:24,630 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-15 11:41:26,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2024-08-15 11:41:39,762 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:42:22,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2024-08-15 11:42:28,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-15 11:42:31,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12150, loss[loss=0.119, beats_loss=0.01063, ecapa_loss=0.0001222, whisper_loss=0.1072, over 23504.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001528, whisper_loss=0.09095, over 3876649.79 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:42:34,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.205e+01 2.453e+01 2.798e+01 9.875e+01, threshold=4.907e+01, percent-clipped=1.0 2024-08-15 11:43:25,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3165070.0, ans=0.95 2024-08-15 11:43:25,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165070.0, ans=0.1 2024-08-15 11:43:41,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3165170.0, ans=0.125 2024-08-15 11:43:46,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12200, loss[loss=0.1035, beats_loss=0.01109, ecapa_loss=0.0001471, whisper_loss=0.09097, over 21950.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0106, ecapa_loss=0.0001528, whisper_loss=0.09036, over 3882209.48 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:44:07,941 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-15 11:44:19,237 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 11:44:27,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3165470.0, ans=0.1 2024-08-15 11:44:43,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3165570.0, ans=0.125 2024-08-15 11:44:51,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=12.0 2024-08-15 11:44:54,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3165670.0, ans=0.125 2024-08-15 11:45:01,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12250, loss[loss=0.1192, beats_loss=0.01024, ecapa_loss=0.0001494, whisper_loss=0.1075, over 22988.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001528, whisper_loss=0.09079, over 3891233.16 frames. ], batch size: 89, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:45:04,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.418e+01 2.740e+01 3.244e+01 5.356e+01, threshold=5.480e+01, percent-clipped=1.0 2024-08-15 11:45:11,336 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 11:45:13,240 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.687e-01 2024-08-15 11:45:16,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3165870.0, ans=0.125 2024-08-15 11:45:32,099 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-15 11:45:32,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3165970.0, ans=0.09899494936611666 2024-08-15 11:45:32,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3165970.0, ans=0.125 2024-08-15 11:45:35,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-15 11:45:42,804 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 11:45:50,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3166070.0, ans=10.0 2024-08-15 11:46:00,696 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.315e-02 2024-08-15 11:46:02,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3166170.0, ans=0.125 2024-08-15 11:46:15,640 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 11:46:16,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12300, loss[loss=0.1267, beats_loss=0.009089, ecapa_loss=0.0001621, whisper_loss=0.116, over 22063.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001535, whisper_loss=0.09101, over 3875397.67 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:46:17,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3166270.0, ans=0.0 2024-08-15 11:46:45,661 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 11:46:47,437 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 11:46:53,807 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.402e+00 2024-08-15 11:47:00,687 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-15 11:47:06,254 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 11:47:12,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3166570.0, ans=0.125 2024-08-15 11:47:17,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3166670.0, ans=0.125 2024-08-15 11:47:29,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12350, loss[loss=0.09806, beats_loss=0.009869, ecapa_loss=0.0001242, whisper_loss=0.08695, over 16041.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001537, whisper_loss=0.09013, over 3873866.29 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:47:32,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.362e+01 2.585e+01 2.912e+01 4.342e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 11:47:38,582 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 11:47:48,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166870.0, ans=0.1 2024-08-15 11:48:05,813 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 11:48:08,540 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 11:48:23,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3167070.0, ans=0.0 2024-08-15 11:48:25,195 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-15 11:48:28,183 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-15 11:48:43,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12400, loss[loss=0.09462, beats_loss=0.008526, ecapa_loss=0.00023, whisper_loss=0.08379, over 17088.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001527, whisper_loss=0.09063, over 3897348.87 frames. ], batch size: 73, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:48:47,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3167270.0, ans=0.0 2024-08-15 11:48:53,346 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 11:48:53,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3167270.0, ans=0.025 2024-08-15 11:48:55,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3167270.0, ans=0.0 2024-08-15 11:48:58,004 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 11:49:20,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167470.0, ans=0.125 2024-08-15 11:49:31,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3167570.0, ans=0.0 2024-08-15 11:49:35,093 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-15 11:49:41,332 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=6.284e-02 2024-08-15 11:49:42,408 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 11:49:44,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3167670.0, ans=0.125 2024-08-15 11:49:58,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12450, loss[loss=0.1134, beats_loss=0.01017, ecapa_loss=0.000141, whisper_loss=0.1018, over 23203.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.000153, whisper_loss=0.09001, over 3896480.66 frames. ], batch size: 91, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:49:58,453 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 11:50:01,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.281e+01 2.553e+01 2.853e+01 4.118e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-15 11:50:12,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3167870.0, ans=0.125 2024-08-15 11:50:17,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3167870.0, ans=15.0 2024-08-15 11:50:30,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3167970.0, ans=0.125 2024-08-15 11:50:44,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3168070.0, ans=0.07 2024-08-15 11:50:53,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3168070.0, ans=0.125 2024-08-15 11:50:57,229 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 11:51:03,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-15 11:51:11,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12500, loss[loss=0.1169, beats_loss=0.008821, ecapa_loss=0.0001733, whisper_loss=0.1064, over 20204.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.000151, whisper_loss=0.09003, over 3898822.72 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:51:11,879 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 11:51:25,604 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 11:51:56,454 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-15 11:52:26,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12550, loss[loss=0.1014, beats_loss=0.01213, ecapa_loss=0.0001309, whisper_loss=0.08797, over 15843.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001501, whisper_loss=0.08994, over 3901536.86 frames. ], batch size: 62, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:52:29,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.452e+01 2.757e+01 2.941e+01 1.392e+02, threshold=5.513e+01, percent-clipped=1.0 2024-08-15 11:52:35,666 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 11:52:46,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-15 11:52:55,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3168970.0, ans=0.0 2024-08-15 11:52:57,064 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-15 11:53:12,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3169070.0, ans=0.125 2024-08-15 11:53:40,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12600, loss[loss=0.103, beats_loss=0.01172, ecapa_loss=0.0001417, whisper_loss=0.08983, over 19982.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.0906, over 3906701.52 frames. ], batch size: 80, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:53:50,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-15 11:53:56,002 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-15 11:53:56,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3169370.0, ans=0.125 2024-08-15 11:54:05,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3169370.0, ans=0.0 2024-08-15 11:54:09,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-08-15 11:54:13,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3169470.0, ans=0.95 2024-08-15 11:54:26,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=22.5 2024-08-15 11:54:27,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3169570.0, ans=0.0 2024-08-15 11:54:31,732 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 11:54:33,451 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-15 11:54:52,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-15 11:54:56,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12650, loss[loss=0.09459, beats_loss=0.01001, ecapa_loss=0.0001903, whisper_loss=0.08268, over 18750.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001516, whisper_loss=0.09002, over 3914572.21 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:54:56,318 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 11:54:58,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.444e+01 2.686e+01 2.954e+01 5.186e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-15 11:54:59,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3169770.0, ans=0.125 2024-08-15 11:55:03,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3169770.0, ans=0.0 2024-08-15 11:55:14,138 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 11:55:32,429 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 11:55:35,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3169970.0, ans=0.1 2024-08-15 11:55:41,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3170070.0, ans=0.125 2024-08-15 11:55:42,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-08-15 11:55:46,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3170070.0, ans=0.125 2024-08-15 11:55:47,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3170070.0, ans=0.0 2024-08-15 11:55:51,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3170070.0, ans=10.0 2024-08-15 11:56:05,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3170170.0, ans=0.125 2024-08-15 11:56:09,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12700, loss[loss=0.1009, beats_loss=0.01078, ecapa_loss=0.0001489, whisper_loss=0.08866, over 20685.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001512, whisper_loss=0.08987, over 3897579.40 frames. ], batch size: 81, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:56:12,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3170270.0, ans=0.0 2024-08-15 11:56:14,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-15 11:56:15,197 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 11:56:34,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3170370.0, ans=0.125 2024-08-15 11:56:43,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3170470.0, ans=0.125 2024-08-15 11:56:45,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=12.0 2024-08-15 11:56:48,389 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 11:56:50,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3170470.0, ans=0.0 2024-08-15 11:56:59,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-15 11:57:22,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12750, loss[loss=0.1055, beats_loss=0.01078, ecapa_loss=0.0002019, whisper_loss=0.09266, over 21118.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001518, whisper_loss=0.09099, over 3936747.83 frames. ], batch size: 92, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:57:24,182 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 11:57:25,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.276e+01 2.433e+01 2.763e+01 4.017e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-15 11:58:04,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.37 vs. limit=6.0 2024-08-15 11:58:06,271 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 11:58:11,437 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-15 11:58:20,135 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 11:58:29,768 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 11:58:30,030 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.727e-02 2024-08-15 11:58:32,500 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-15 11:58:39,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12800, loss[loss=0.1216, beats_loss=0.01059, ecapa_loss=0.0001408, whisper_loss=0.1096, over 19061.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01056, ecapa_loss=0.0001527, whisper_loss=0.09149, over 3903258.71 frames. ], batch size: 75, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:59:02,790 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-15 11:59:17,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3171470.0, ans=0.0 2024-08-15 11:59:19,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3171470.0, ans=0.0 2024-08-15 11:59:29,506 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 11:59:32,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3171570.0, ans=0.125 2024-08-15 11:59:37,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3171570.0, ans=0.125 2024-08-15 11:59:53,321 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 11:59:54,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12850, loss[loss=0.0929, beats_loss=0.01126, ecapa_loss=0.0001641, whisper_loss=0.08, over 17515.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001524, whisper_loss=0.09024, over 3871160.99 frames. ], batch size: 70, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 11:59:57,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.257e+01 2.519e+01 2.816e+01 4.550e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-15 12:00:05,266 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 17 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-15 12:00:11,280 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-15 12:00:14,534 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:00:30,235 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-15 12:00:35,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3171970.0, ans=0.0 2024-08-15 12:00:40,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-08-15 12:00:48,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3172070.0, ans=0.125 2024-08-15 12:00:57,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2024-08-15 12:01:08,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12900, loss[loss=0.1165, beats_loss=0.01055, ecapa_loss=0.0001317, whisper_loss=0.1046, over 24530.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001537, whisper_loss=0.08955, over 3845879.96 frames. ], batch size: 93, lr: 2.81e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:01:12,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3172270.0, ans=0.125 2024-08-15 12:01:18,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-15 12:01:21,525 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-15 12:01:32,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3172370.0, ans=0.0 2024-08-15 12:02:12,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3172670.0, ans=0.125 2024-08-15 12:02:21,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 12950, loss[loss=0.1023, beats_loss=0.01155, ecapa_loss=0.0001757, whisper_loss=0.08899, over 20958.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001532, whisper_loss=0.09001, over 3856555.69 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:02:24,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.275e+01 2.546e+01 2.873e+01 4.108e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-15 12:02:47,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3172870.0, ans=0.0 2024-08-15 12:03:03,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3172970.0, ans=0.125 2024-08-15 12:03:06,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3173070.0, ans=0.2 2024-08-15 12:03:07,796 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 12:03:09,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173070.0, ans=0.1 2024-08-15 12:03:09,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-15 12:03:12,023 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-15 12:03:14,968 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-15 12:03:17,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-15 12:03:29,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.26 vs. limit=6.0 2024-08-15 12:03:35,529 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-15 12:03:37,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13000, loss[loss=0.096, beats_loss=0.01257, ecapa_loss=0.0001053, whisper_loss=0.08237, over 20477.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001536, whisper_loss=0.09095, over 3874845.49 frames. ], batch size: 80, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:03:42,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3173270.0, ans=0.0 2024-08-15 12:03:48,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3173270.0, ans=0.125 2024-08-15 12:04:20,482 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0589878149330616, model_norm_threshold=50.92251968383789 2024-08-15 12:04:20,650 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.49, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.672e+05, grad_sumsq=3.654e+07, orig_rms_sq=1.005e-02 2024-08-15 12:04:40,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3173670.0, ans=0.04949747468305833 2024-08-15 12:04:51,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13050, loss[loss=0.1044, beats_loss=0.009026, ecapa_loss=0.0001355, whisper_loss=0.09406, over 16955.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.000153, whisper_loss=0.0906, over 3891795.44 frames. ], batch size: 63, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:04:54,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.413e+01 2.554e+01 2.771e+01 8.633e+02, threshold=5.107e+01, percent-clipped=2.0 2024-08-15 12:05:02,045 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-15 12:05:08,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3173870.0, ans=0.125 2024-08-15 12:05:13,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-08-15 12:05:19,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3173870.0, ans=0.125 2024-08-15 12:05:20,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3173970.0, ans=0.0 2024-08-15 12:05:21,726 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-15 12:05:26,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-15 12:05:32,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3173970.0, ans=0.0 2024-08-15 12:06:06,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13100, loss[loss=0.09526, beats_loss=0.008989, ecapa_loss=0.0001382, whisper_loss=0.08489, over 18890.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.09018, over 3876084.92 frames. ], batch size: 72, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:06:13,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3174270.0, ans=10.0 2024-08-15 12:06:18,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3174270.0, ans=0.125 2024-08-15 12:06:25,125 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 12:06:32,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-15 12:06:34,472 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 12:06:41,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3174470.0, ans=0.0 2024-08-15 12:07:00,295 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 12:07:08,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2024-08-15 12:07:11,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-15 12:07:18,106 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-15 12:07:20,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13150, loss[loss=0.09645, beats_loss=0.009585, ecapa_loss=0.0001713, whisper_loss=0.08515, over 15880.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001513, whisper_loss=0.09041, over 3875875.16 frames. ], batch size: 62, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:07:23,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.322e+01 2.580e+01 2.894e+01 4.254e+01, threshold=5.159e+01, percent-clipped=0.0 2024-08-15 12:07:40,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3174870.0, ans=0.125 2024-08-15 12:07:47,639 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 12:08:11,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3175070.0, ans=0.0 2024-08-15 12:08:14,360 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 12:08:31,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3175170.0, ans=0.125 2024-08-15 12:08:32,655 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 9 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 12:08:34,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13200, loss[loss=0.06428, beats_loss=0.01274, ecapa_loss=0.0001718, whisper_loss=0.04982, over 15057.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001501, whisper_loss=0.09034, over 3872192.54 frames. ], batch size: 62, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:08:35,112 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.22 vs. limit=10.0 2024-08-15 12:09:14,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3175470.0, ans=0.5 2024-08-15 12:09:31,963 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-15 12:09:50,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13250, loss[loss=0.07708, beats_loss=0.01382, ecapa_loss=0.0001268, whisper_loss=0.06198, over 20676.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001512, whisper_loss=0.08991, over 3851322.46 frames. ], batch size: 85, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:09:53,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.280e+01 2.540e+01 2.785e+01 5.121e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-15 12:09:53,648 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 12:10:02,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3175770.0, ans=0.0 2024-08-15 12:10:08,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3175870.0, ans=0.125 2024-08-15 12:10:29,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3175970.0, ans=0.125 2024-08-15 12:10:45,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3176070.0, ans=0.125 2024-08-15 12:10:47,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3176070.0, ans=22.5 2024-08-15 12:11:05,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13300, loss[loss=0.1192, beats_loss=0.01025, ecapa_loss=0.000155, whisper_loss=0.1074, over 23134.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001503, whisper_loss=0.0908, over 3824864.24 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:11:13,269 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 12:11:21,890 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-15 12:11:44,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-08-15 12:11:47,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176470.0, ans=0.1 2024-08-15 12:11:54,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3176570.0, ans=0.125 2024-08-15 12:11:55,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3176570.0, ans=0.1 2024-08-15 12:11:58,194 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 12:12:17,432 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 12:12:17,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3176770.0, ans=0.125 2024-08-15 12:12:18,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13350, loss[loss=0.1088, beats_loss=0.008068, ecapa_loss=0.0001858, whisper_loss=0.09885, over 16379.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.09083, over 3845161.67 frames. ], batch size: 66, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:12:21,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.329e+01 2.607e+01 2.940e+01 2.592e+02, threshold=5.213e+01, percent-clipped=3.0 2024-08-15 12:12:37,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3176870.0, ans=0.0 2024-08-15 12:12:57,258 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 12:13:17,993 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 12:13:25,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3177170.0, ans=0.125 2024-08-15 12:13:32,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13400, loss[loss=0.1051, beats_loss=0.008958, ecapa_loss=0.0001682, whisper_loss=0.0945, over 19616.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09079, over 3833197.08 frames. ], batch size: 80, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:13:34,274 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-15 12:13:35,615 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 12:13:39,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3177270.0, ans=0.2 2024-08-15 12:13:47,547 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 12:14:18,273 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 12:14:18,582 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:14:18,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3177570.0, ans=0.125 2024-08-15 12:14:36,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-15 12:14:45,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13450, loss[loss=0.1064, beats_loss=0.009094, ecapa_loss=0.0001977, whisper_loss=0.09532, over 21109.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.00015, whisper_loss=0.09043, over 3884347.82 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:14:48,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.536e+01 2.666e+01 2.899e+01 1.016e+02, threshold=5.331e+01, percent-clipped=2.0 2024-08-15 12:14:49,035 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 12:14:54,623 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 12:15:05,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-15 12:15:13,967 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-15 12:15:15,772 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:15:30,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3178070.0, ans=0.0 2024-08-15 12:15:52,253 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 12:15:56,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3178170.0, ans=0.2 2024-08-15 12:16:00,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13500, loss[loss=0.102, beats_loss=0.00845, ecapa_loss=0.0001649, whisper_loss=0.09187, over 16494.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001503, whisper_loss=0.09032, over 3894448.01 frames. ], batch size: 65, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:16:31,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3178470.0, ans=0.1 2024-08-15 12:16:34,521 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.614e+05 2024-08-15 12:17:01,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-15 12:17:12,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3178670.0, ans=0.2 2024-08-15 12:17:14,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13550, loss[loss=0.08006, beats_loss=0.01172, ecapa_loss=0.0001059, whisper_loss=0.06728, over 16409.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001498, whisper_loss=0.0904, over 3876099.47 frames. ], batch size: 63, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:17:17,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.308e+01 2.563e+01 2.825e+01 4.152e+01, threshold=5.126e+01, percent-clipped=0.0 2024-08-15 12:17:38,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3178870.0, ans=0.025 2024-08-15 12:17:53,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3178970.0, ans=0.0 2024-08-15 12:17:57,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3179070.0, ans=0.0 2024-08-15 12:17:58,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-15 12:18:04,957 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-15 12:18:24,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3179170.0, ans=0.125 2024-08-15 12:18:28,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13600, loss[loss=0.1043, beats_loss=0.0103, ecapa_loss=0.000165, whisper_loss=0.09233, over 16116.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.0001491, whisper_loss=0.08991, over 3861723.22 frames. ], batch size: 66, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:18:38,852 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 12:18:42,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3179370.0, ans=0.125 2024-08-15 12:18:43,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179370.0, ans=0.1 2024-08-15 12:18:47,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3179370.0, ans=0.125 2024-08-15 12:18:49,230 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 12:18:55,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-15 12:18:55,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=12.0 2024-08-15 12:19:05,381 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 12:19:25,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-15 12:19:34,856 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-15 12:19:41,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13650, loss[loss=0.1103, beats_loss=0.009037, ecapa_loss=0.000151, whisper_loss=0.09978, over 23422.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001506, whisper_loss=0.09051, over 3878435.46 frames. ], batch size: 94, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:19:44,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.314e+01 2.512e+01 2.853e+01 1.013e+02, threshold=5.025e+01, percent-clipped=2.0 2024-08-15 12:19:56,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3179870.0, ans=0.0 2024-08-15 12:19:58,925 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 12:20:02,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3179870.0, ans=0.0 2024-08-15 12:20:24,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3180070.0, ans=0.04949747468305833 2024-08-15 12:20:30,132 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-15 12:20:37,236 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-15 12:20:44,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-15 12:20:51,298 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 12:20:55,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13700, loss[loss=0.1079, beats_loss=0.01, ecapa_loss=0.0001471, whisper_loss=0.09643, over 17903.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001502, whisper_loss=0.09083, over 3867953.77 frames. ], batch size: 73, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:20:55,780 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 12:21:20,502 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 12:21:24,930 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 12:21:29,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3180470.0, ans=0.1 2024-08-15 12:21:30,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3180470.0, ans=0.2 2024-08-15 12:21:36,635 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-15 12:21:47,343 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 12:21:53,341 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 12:21:59,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-15 12:22:02,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3180670.0, ans=0.0 2024-08-15 12:22:06,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-15 12:22:11,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13750, loss[loss=0.1219, beats_loss=0.006314, ecapa_loss=0.0001489, whisper_loss=0.1141, over 15198.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.09089, over 3839605.88 frames. ], batch size: 55, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:22:12,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3180770.0, ans=0.0 2024-08-15 12:22:14,194 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.530e+01 2.885e+01 4.854e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 12:22:42,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3180970.0, ans=0.1 2024-08-15 12:22:43,388 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 12:23:24,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3181270.0, ans=0.0 2024-08-15 12:23:25,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13800, loss[loss=0.08905, beats_loss=0.0118, ecapa_loss=0.0001549, whisper_loss=0.0757, over 21163.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01052, ecapa_loss=0.0001493, whisper_loss=0.09217, over 3851608.82 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:23:26,018 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 12:23:43,944 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 12:24:03,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3181470.0, ans=0.2 2024-08-15 12:24:09,099 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 12:24:10,414 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 12:24:15,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-15 12:24:19,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181570.0, ans=0.1 2024-08-15 12:24:20,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3181570.0, ans=0.125 2024-08-15 12:24:40,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13850, loss[loss=0.08399, beats_loss=0.0123, ecapa_loss=0.0001394, whisper_loss=0.0703, over 13859.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001491, whisper_loss=0.09188, over 3829871.65 frames. ], batch size: 57, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:24:43,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.369e+01 2.668e+01 2.994e+01 7.332e+01, threshold=5.336e+01, percent-clipped=2.0 2024-08-15 12:24:49,024 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-15 12:25:11,006 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-15 12:25:53,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13900, loss[loss=0.1181, beats_loss=0.009605, ecapa_loss=0.0001704, whisper_loss=0.1067, over 21952.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01051, ecapa_loss=0.0001497, whisper_loss=0.09212, over 3842746.29 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:25:54,020 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 12:25:57,938 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-15 12:26:06,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-15 12:26:14,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3182370.0, ans=0.0 2024-08-15 12:26:36,562 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 12:26:38,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3182570.0, ans=0.05 2024-08-15 12:26:46,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3182570.0, ans=0.125 2024-08-15 12:27:00,041 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 12:27:04,335 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-15 12:27:06,728 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 13950, loss[loss=0.1301, beats_loss=0.007792, ecapa_loss=0.0001513, whisper_loss=0.1208, over 22986.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01047, ecapa_loss=0.000151, whisper_loss=0.09227, over 3840809.78 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:27:09,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.273e+01 2.481e+01 2.745e+01 4.473e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-15 12:27:10,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3182770.0, ans=0.125 2024-08-15 12:27:12,323 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 12:27:32,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3182870.0, ans=0.0 2024-08-15 12:27:36,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2024-08-15 12:27:41,679 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 12:27:42,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-15 12:27:57,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-15 12:28:20,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14000, loss[loss=0.09797, beats_loss=0.01156, ecapa_loss=0.0001359, whisper_loss=0.08505, over 17287.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001492, whisper_loss=0.09135, over 3817685.68 frames. ], batch size: 68, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:28:38,474 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-15 12:28:47,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3183370.0, ans=0.125 2024-08-15 12:28:57,795 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-15 12:29:28,504 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 12:29:29,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3183670.0, ans=0.125 2024-08-15 12:29:30,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-08-15 12:29:34,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14050, loss[loss=0.09761, beats_loss=0.00713, ecapa_loss=0.0001651, whisper_loss=0.08883, over 16903.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01049, ecapa_loss=0.0001498, whisper_loss=0.09176, over 3836822.15 frames. ], batch size: 64, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:29:37,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.178e+01 2.428e+01 2.740e+01 4.100e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-15 12:29:58,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3183870.0, ans=0.125 2024-08-15 12:29:59,766 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-15 12:30:49,430 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:30:50,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14100, loss[loss=0.1168, beats_loss=0.01141, ecapa_loss=0.0001448, whisper_loss=0.104, over 22891.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.000148, whisper_loss=0.09109, over 3854326.33 frames. ], batch size: 91, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:31:24,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-08-15 12:31:31,145 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 12:31:31,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3184470.0, ans=0.125 2024-08-15 12:31:34,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-08-15 12:32:03,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14150, loss[loss=0.094, beats_loss=0.0104, ecapa_loss=0.00016, whisper_loss=0.08199, over 21091.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0106, ecapa_loss=0.0001479, whisper_loss=0.09127, over 3876164.45 frames. ], batch size: 89, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:32:06,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.412e+01 2.567e+01 2.890e+01 3.775e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-15 12:32:27,459 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 12:32:43,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3184970.0, ans=0.125 2024-08-15 12:32:49,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3185070.0, ans=0.125 2024-08-15 12:32:57,691 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 12:33:00,780 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-15 12:33:01,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3185070.0, ans=0.05 2024-08-15 12:33:04,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3185170.0, ans=0.125 2024-08-15 12:33:08,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185170.0, ans=0.1 2024-08-15 12:33:18,063 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.007e-02 2024-08-15 12:33:22,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14200, loss[loss=0.1081, beats_loss=0.009379, ecapa_loss=0.0001496, whisper_loss=0.09725, over 21670.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001472, whisper_loss=0.09165, over 3913199.68 frames. ], batch size: 85, lr: 2.80e-03, grad_scale: 1.152921504606847e+18 2024-08-15 12:33:25,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-15 12:33:30,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3185270.0, ans=0.125 2024-08-15 12:34:00,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2024-08-15 12:34:03,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3185470.0, ans=0.2 2024-08-15 12:34:22,736 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 12:34:44,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14250, loss[loss=0.1113, beats_loss=0.01001, ecapa_loss=0.0001413, whisper_loss=0.09986, over 23466.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0106, ecapa_loss=0.0001467, whisper_loss=0.09162, over 3914251.60 frames. ], batch size: 94, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:34:49,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.313e+01 2.543e+01 2.810e+01 4.306e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-15 12:34:55,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-15 12:34:56,120 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 12:35:08,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3185870.0, ans=0.125 2024-08-15 12:35:29,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3185970.0, ans=0.02 2024-08-15 12:35:29,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3185970.0, ans=10.0 2024-08-15 12:36:05,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3186170.0, ans=0.125 2024-08-15 12:36:23,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14300, loss[loss=0.09265, beats_loss=0.01177, ecapa_loss=0.0001474, whisper_loss=0.0794, over 22240.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001464, whisper_loss=0.09149, over 3911584.92 frames. ], batch size: 90, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:36:29,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3186270.0, ans=0.125 2024-08-15 12:36:37,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186270.0, ans=0.1 2024-08-15 12:36:46,489 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:36:52,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3186370.0, ans=0.0 2024-08-15 12:36:53,580 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 12:37:18,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3186470.0, ans=0.0 2024-08-15 12:37:34,213 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-15 12:37:42,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3186670.0, ans=0.125 2024-08-15 12:37:53,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3186670.0, ans=0.125 2024-08-15 12:38:03,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14350, loss[loss=0.09311, beats_loss=0.008707, ecapa_loss=0.0001845, whisper_loss=0.08256, over 14404.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.0001468, whisper_loss=0.09078, over 3910106.63 frames. ], batch size: 58, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:38:09,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.292e+01 2.515e+01 2.764e+01 5.097e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 12:38:19,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3186770.0, ans=0.0 2024-08-15 12:38:26,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2024-08-15 12:38:35,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186870.0, ans=0.1 2024-08-15 12:38:43,213 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 12:39:11,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-15 12:39:24,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3187170.0, ans=0.0 2024-08-15 12:39:44,878 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 12:39:46,699 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14400, loss[loss=0.1021, beats_loss=0.009333, ecapa_loss=0.0001666, whisper_loss=0.09115, over 20251.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001487, whisper_loss=0.09125, over 3903496.70 frames. ], batch size: 81, lr: 2.80e-03, grad_scale: 5.764607523034235e+17 2024-08-15 12:40:05,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3187270.0, ans=0.0 2024-08-15 12:40:11,058 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-15 12:40:21,450 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-15 12:40:22,921 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-15 12:40:37,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-15 12:40:39,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3187570.0, ans=0.0 2024-08-15 12:40:45,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-08-15 12:40:59,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3187670.0, ans=0.125 2024-08-15 12:41:06,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 22, batch 14450, loss[loss=0.08074, beats_loss=0.011, ecapa_loss=0.0001695, whisper_loss=0.06804, over 21214.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001502, whisper_loss=0.08978, over 3874765.55 frames. ], batch size: 92, lr: 2.80e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:41:07,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187770.0, ans=0.1 2024-08-15 12:41:07,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2024-08-15 12:41:12,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.363e+01 2.570e+01 2.963e+01 1.669e+02, threshold=5.140e+01, percent-clipped=2.0 2024-08-15 12:41:21,177 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 12:41:41,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3187970.0, ans=0.1 2024-08-15 12:41:42,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3187970.0, ans=0.125 2024-08-15 12:41:45,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3187970.0, ans=0.0 2024-08-15 12:41:53,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3188070.0, ans=0.125 2024-08-15 12:41:58,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3188070.0, ans=0.125 2024-08-15 12:41:59,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-15 12:42:06,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3188170.0, ans=0.0 2024-08-15 12:42:46,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 0, loss[loss=0.1356, beats_loss=0.007475, ecapa_loss=0.0001095, whisper_loss=0.1271, over 21840.00 frames. ], tot_loss[loss=0.1356, beats_loss=0.007475, ecapa_loss=0.0001095, whisper_loss=0.1271, over 21840.00 frames. ], batch size: 76, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:42:46,631 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 12:43:28,490 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 12:43:45,268 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.00428, beats_loss=0, ecapa_loss=0.000428, whisper_loss=0, over 939242.00 frames. 2024-08-15 12:44:29,629 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9459, 4.7138, 4.8415, 4.9114], device='cuda:3') 2024-08-15 12:45:43,920 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 12:45:43,927 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 12:46:00,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-15 12:46:26,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3188320.0, ans=0.125 2024-08-15 12:47:22,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-08-15 12:47:42,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3188620.0, ans=0.0 2024-08-15 12:47:45,836 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 12:47:50,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 50, loss[loss=0.08066, beats_loss=0.01059, ecapa_loss=0.0001386, whisper_loss=0.06869, over 21417.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009416, ecapa_loss=0.0001575, whisper_loss=0.09015, over 856877.40 frames. ], batch size: 87, lr: 2.74e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:47:50,467 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 12:48:08,634 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 12:48:13,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.413e+01 2.733e+01 3.074e+01 3.899e+01, threshold=5.466e+01, percent-clipped=0.0 2024-08-15 12:49:10,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2024-08-15 12:49:37,312 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 12:49:45,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3189120.0, ans=0.125 2024-08-15 12:49:46,167 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 12:49:49,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 100, loss[loss=0.08804, beats_loss=0.01272, ecapa_loss=0.0001115, whisper_loss=0.07421, over 17601.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009411, ecapa_loss=0.0001531, whisper_loss=0.09093, over 1485411.41 frames. ], batch size: 69, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:50:17,349 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 12:50:41,151 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 12:50:43,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3189420.0, ans=0.125 2024-08-15 12:50:43,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3189420.0, ans=0.05 2024-08-15 12:50:46,701 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 12:50:57,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3189420.0, ans=0.125 2024-08-15 12:50:59,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189520.0, ans=0.1 2024-08-15 12:50:59,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3189520.0, ans=0.5 2024-08-15 12:51:03,110 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 12:51:11,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3189520.0, ans=0.1 2024-08-15 12:51:17,420 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 12:51:24,506 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 12:51:41,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 150, loss[loss=0.1013, beats_loss=0.006669, ecapa_loss=0.0001759, whisper_loss=0.09287, over 15095.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009498, ecapa_loss=0.0001516, whisper_loss=0.0899, over 1987184.94 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:51:50,429 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-15 12:51:57,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.541e+01 2.794e+01 3.145e+01 4.567e+01, threshold=5.588e+01, percent-clipped=0.0 2024-08-15 12:51:59,479 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-15 12:52:04,942 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-15 12:53:02,703 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 12:53:05,914 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 200, loss[loss=0.1101, beats_loss=0.01055, ecapa_loss=0.0001537, whisper_loss=0.09799, over 22867.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009756, ecapa_loss=0.0001519, whisper_loss=0.09051, over 2384375.46 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:53:06,229 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 12:53:08,049 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-15 12:53:20,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3190320.0, ans=0.125 2024-08-15 12:53:25,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3190320.0, ans=0.0 2024-08-15 12:53:36,786 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-15 12:53:58,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3190520.0, ans=0.125 2024-08-15 12:54:03,369 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 12:54:04,411 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-15 12:54:10,726 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 12:54:16,804 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 12:54:19,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-15 12:54:20,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3190620.0, ans=0.125 2024-08-15 12:54:24,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2024-08-15 12:54:24,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 250, loss[loss=0.08385, beats_loss=0.01321, ecapa_loss=0.0001002, whisper_loss=0.06964, over 21723.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.00996, ecapa_loss=0.0001512, whisper_loss=0.08958, over 2688189.14 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:54:34,487 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 12:54:38,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.264e+01 2.507e+01 2.916e+01 4.701e+01, threshold=5.014e+01, percent-clipped=0.0 2024-08-15 12:54:45,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3190820.0, ans=0.1 2024-08-15 12:54:58,799 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 12:54:59,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3190920.0, ans=0.1 2024-08-15 12:55:12,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3191020.0, ans=0.125 2024-08-15 12:55:40,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:55:41,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 300, loss[loss=0.08676, beats_loss=0.01303, ecapa_loss=0.0001367, whisper_loss=0.07236, over 18114.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01018, ecapa_loss=0.0001522, whisper_loss=0.08946, over 2920644.33 frames. ], batch size: 69, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:55:43,546 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 12:55:44,696 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 12:55:48,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3191220.0, ans=0.2 2024-08-15 12:55:52,558 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 12:55:53,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3191220.0, ans=0.125 2024-08-15 12:55:59,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3191320.0, ans=0.2 2024-08-15 12:56:06,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3191320.0, ans=0.0 2024-08-15 12:56:40,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2024-08-15 12:56:51,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-15 12:56:58,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 350, loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001827, whisper_loss=0.08874, over 15231.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001499, whisper_loss=0.08914, over 3130704.01 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:57:10,182 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 34 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-15 12:57:12,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.524e+01 2.862e+01 4.157e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 12:57:23,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191820.0, ans=0.1 2024-08-15 12:57:35,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191920.0, ans=0.1 2024-08-15 12:57:36,380 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-15 12:57:37,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3191920.0, ans=0.2 2024-08-15 12:57:40,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3191920.0, ans=0.0 2024-08-15 12:57:46,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3192020.0, ans=0.125 2024-08-15 12:57:54,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3192020.0, ans=0.05 2024-08-15 12:58:07,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3192120.0, ans=0.5 2024-08-15 12:58:09,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-15 12:58:11,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3192120.0, ans=0.0 2024-08-15 12:58:16,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 400, loss[loss=0.1, beats_loss=0.01012, ecapa_loss=0.0001866, whisper_loss=0.08806, over 21303.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001492, whisper_loss=0.08989, over 3278060.59 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:58:19,095 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-15 12:58:26,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3192220.0, ans=0.125 2024-08-15 12:58:53,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3192420.0, ans=0.125 2024-08-15 12:59:24,024 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-15 12:59:28,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3192620.0, ans=0.0 2024-08-15 12:59:29,475 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 12:59:33,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-15 12:59:35,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 450, loss[loss=0.1204, beats_loss=0.01121, ecapa_loss=0.0001289, whisper_loss=0.1079, over 19643.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.0001489, whisper_loss=0.0907, over 3416954.15 frames. ], batch size: 73, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 12:59:42,624 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 12:59:49,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.262e+01 2.454e+01 2.784e+01 4.737e+01, threshold=4.907e+01, percent-clipped=0.0 2024-08-15 12:59:54,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3192820.0, ans=0.125 2024-08-15 13:00:13,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3192920.0, ans=0.125 2024-08-15 13:00:24,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3193020.0, ans=0.125 2024-08-15 13:00:26,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3193020.0, ans=0.0 2024-08-15 13:00:34,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3193020.0, ans=0.025 2024-08-15 13:00:36,035 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-15 13:00:42,834 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 13:00:58,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 500, loss[loss=0.06809, beats_loss=0.01195, ecapa_loss=0.000129, whisper_loss=0.05484, over 14479.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001487, whisper_loss=0.09004, over 3519426.47 frames. ], batch size: 58, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:01:08,414 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 13:01:09,743 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 13:01:16,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3193320.0, ans=0.0 2024-08-15 13:01:38,016 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 13:01:50,593 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 13:01:51,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-08-15 13:02:30,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 550, loss[loss=0.1138, beats_loss=0.01101, ecapa_loss=0.0001075, whisper_loss=0.1018, over 19503.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001471, whisper_loss=0.09052, over 3589617.54 frames. ], batch size: 73, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:02:45,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.304e+01 2.513e+01 2.793e+01 3.514e+01, threshold=5.025e+01, percent-clipped=0.0 2024-08-15 13:02:49,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193820.0, ans=0.1 2024-08-15 13:02:54,604 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 13:03:11,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3193920.0, ans=0.125 2024-08-15 13:03:19,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2024-08-15 13:03:24,880 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-15 13:03:25,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3194020.0, ans=0.2 2024-08-15 13:03:56,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 600, loss[loss=0.09174, beats_loss=0.01184, ecapa_loss=0.0001247, whisper_loss=0.07865, over 17994.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001466, whisper_loss=0.0903, over 3660161.56 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:03:59,982 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 13:04:09,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3194220.0, ans=0.1 2024-08-15 13:04:14,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3194320.0, ans=0.1 2024-08-15 13:04:20,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3194320.0, ans=0.0 2024-08-15 13:04:31,366 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 13:04:33,878 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 13:04:35,403 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 13:04:37,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2024-08-15 13:04:49,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3194520.0, ans=0.125 2024-08-15 13:04:50,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3194520.0, ans=0.125 2024-08-15 13:04:55,209 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-15 13:04:59,315 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:05:07,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 650, loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001543, whisper_loss=0.09087, over 18137.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001474, whisper_loss=0.09042, over 3687317.16 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:05:18,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.357e+01 2.569e+01 2.869e+01 2.947e+02, threshold=5.138e+01, percent-clipped=4.0 2024-08-15 13:05:31,824 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 13:05:50,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3195020.0, ans=0.125 2024-08-15 13:06:01,950 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-15 13:06:06,172 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 13:06:12,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 700, loss[loss=0.09977, beats_loss=0.01258, ecapa_loss=0.0001264, whisper_loss=0.08593, over 21735.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001472, whisper_loss=0.08994, over 3739281.98 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:06:20,203 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:06:27,535 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 14 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 13:06:35,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3195320.0, ans=0.125 2024-08-15 13:06:46,311 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-15 13:06:51,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3195520.0, ans=0.125 2024-08-15 13:07:01,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2024-08-15 13:07:12,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3195620.0, ans=0.2 2024-08-15 13:07:16,511 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 750, loss[loss=0.1127, beats_loss=0.01029, ecapa_loss=0.000123, whisper_loss=0.1012, over 16673.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001459, whisper_loss=0.08942, over 3745824.35 frames. ], batch size: 65, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:07:28,509 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.327e+01 2.582e+01 2.848e+01 1.200e+02, threshold=5.164e+01, percent-clipped=2.0 2024-08-15 13:07:41,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3195920.0, ans=0.125 2024-08-15 13:07:55,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3196020.0, ans=0.0 2024-08-15 13:07:55,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-15 13:08:05,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3196020.0, ans=0.0 2024-08-15 13:08:05,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3196020.0, ans=0.125 2024-08-15 13:08:15,457 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 13:08:21,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 800, loss[loss=0.08184, beats_loss=0.01169, ecapa_loss=0.0001355, whisper_loss=0.06879, over 14918.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001463, whisper_loss=0.0898, over 3764027.55 frames. ], batch size: 61, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:08:21,793 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 13:08:31,179 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 13:08:31,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3196220.0, ans=0.2 2024-08-15 13:08:37,593 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 13:08:47,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3196420.0, ans=0.09899494936611666 2024-08-15 13:08:54,944 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 13:08:58,759 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 13:09:17,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196620.0, ans=0.1 2024-08-15 13:09:27,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 850, loss[loss=0.1242, beats_loss=0.00842, ecapa_loss=0.000149, whisper_loss=0.1143, over 22402.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001464, whisper_loss=0.08906, over 3763749.94 frames. ], batch size: 86, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:09:35,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3196720.0, ans=0.125 2024-08-15 13:09:38,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.346e+01 2.636e+01 2.893e+01 3.086e+02, threshold=5.271e+01, percent-clipped=3.0 2024-08-15 13:09:39,103 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-15 13:09:56,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3196920.0, ans=0.125 2024-08-15 13:09:59,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3196920.0, ans=0.5 2024-08-15 13:10:09,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3197020.0, ans=0.125 2024-08-15 13:10:11,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3197020.0, ans=0.1 2024-08-15 13:10:17,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197020.0, ans=0.1 2024-08-15 13:10:23,963 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:10:27,574 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 13:10:33,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 900, loss[loss=0.09952, beats_loss=0.01091, ecapa_loss=0.0001473, whisper_loss=0.08713, over 22267.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001453, whisper_loss=0.08887, over 3766118.67 frames. ], batch size: 89, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:10:40,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.17 vs. limit=10.0 2024-08-15 13:10:43,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3197220.0, ans=0.0 2024-08-15 13:10:58,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.94 vs. limit=22.5 2024-08-15 13:10:59,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197420.0, ans=0.1 2024-08-15 13:11:00,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3197420.0, ans=0.125 2024-08-15 13:11:13,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3197520.0, ans=0.0 2024-08-15 13:11:31,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3197620.0, ans=0.125 2024-08-15 13:11:32,107 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 13:11:38,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 950, loss[loss=0.0995, beats_loss=0.01102, ecapa_loss=0.0001941, whisper_loss=0.08653, over 18154.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001464, whisper_loss=0.08914, over 3788607.34 frames. ], batch size: 74, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:11:48,747 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 13:11:50,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.292e+01 2.595e+01 2.867e+01 1.968e+02, threshold=5.190e+01, percent-clipped=1.0 2024-08-15 13:12:05,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2024-08-15 13:12:44,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1000, loss[loss=0.1095, beats_loss=0.009122, ecapa_loss=0.000166, whisper_loss=0.09875, over 22662.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.08868, over 3766747.06 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:13:06,835 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 13:13:19,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198420.0, ans=0.1 2024-08-15 13:13:33,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198520.0, ans=0.1 2024-08-15 13:13:38,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2024-08-15 13:13:39,769 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 13:13:49,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1050, loss[loss=0.07991, beats_loss=0.009487, ecapa_loss=0.0001311, whisper_loss=0.06911, over 14329.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001457, whisper_loss=0.08915, over 3815335.68 frames. ], batch size: 54, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:13:52,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-15 13:14:01,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.361e+01 2.585e+01 2.930e+01 4.862e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 13:14:06,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3198820.0, ans=0.125 2024-08-15 13:14:13,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3198820.0, ans=0.125 2024-08-15 13:14:27,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3199020.0, ans=0.1 2024-08-15 13:14:27,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3199020.0, ans=0.125 2024-08-15 13:14:33,760 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-15 13:14:48,182 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-15 13:14:51,931 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 13:14:54,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1100, loss[loss=0.08769, beats_loss=0.01145, ecapa_loss=0.0001303, whisper_loss=0.07493, over 21928.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001457, whisper_loss=0.08949, over 3842913.89 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:14:54,456 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 13:14:56,176 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 13:15:07,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3199320.0, ans=0.05 2024-08-15 13:15:16,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3199320.0, ans=0.125 2024-08-15 13:15:23,075 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:15:42,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3199520.0, ans=0.125 2024-08-15 13:15:44,503 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-15 13:15:59,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1150, loss[loss=0.09221, beats_loss=0.01148, ecapa_loss=0.0001681, whisper_loss=0.07904, over 16621.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001463, whisper_loss=0.08986, over 3861767.27 frames. ], batch size: 66, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:16:11,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.313e+01 2.590e+01 2.898e+01 5.614e+01, threshold=5.180e+01, percent-clipped=1.0 2024-08-15 13:16:15,488 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 13:16:18,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3199820.0, ans=0.125 2024-08-15 13:16:27,939 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 13:16:29,166 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-15 13:17:04,240 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 13:17:06,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3200120.0, ans=0.2 2024-08-15 13:17:09,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1200, loss[loss=0.06832, beats_loss=0.01371, ecapa_loss=0.0001485, whisper_loss=0.05312, over 16342.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001468, whisper_loss=0.08893, over 3853689.71 frames. ], batch size: 70, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:17:10,745 WARNING [optim.py:496] (3/4) Scaling gradients by 0.052070412784814835, model_norm_threshold=51.8048095703125 2024-08-15 13:17:10,927 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.791e+07, orig_rms_sq=1.005e-02 2024-08-15 13:17:12,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3200220.0, ans=0.0 2024-08-15 13:17:16,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3200220.0, ans=0.125 2024-08-15 13:17:52,319 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-15 13:17:53,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3200520.0, ans=0.0 2024-08-15 13:18:00,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3200520.0, ans=0.125 2024-08-15 13:18:01,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3200620.0, ans=0.0 2024-08-15 13:18:15,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1250, loss[loss=0.1087, beats_loss=0.0109, ecapa_loss=0.0001348, whisper_loss=0.09649, over 22074.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01057, ecapa_loss=0.0001468, whisper_loss=0.08862, over 3813941.70 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:18:25,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3200720.0, ans=0.95 2024-08-15 13:18:27,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.258e+01 2.452e+01 2.719e+01 9.949e+02, threshold=4.904e+01, percent-clipped=2.0 2024-08-15 13:18:47,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3200920.0, ans=0.125 2024-08-15 13:18:54,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2024-08-15 13:18:56,684 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 13:18:57,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3201020.0, ans=0.125 2024-08-15 13:19:03,585 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.264e-01 2024-08-15 13:19:10,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3201120.0, ans=0.2 2024-08-15 13:19:20,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3201220.0, ans=0.2 2024-08-15 13:19:21,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1300, loss[loss=0.07984, beats_loss=0.01128, ecapa_loss=0.0001347, whisper_loss=0.06721, over 17744.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001469, whisper_loss=0.0893, over 3839976.42 frames. ], batch size: 70, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:19:49,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3201420.0, ans=0.125 2024-08-15 13:20:00,793 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 13:20:13,205 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-15 13:20:25,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-15 13:20:27,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1350, loss[loss=0.1226, beats_loss=0.008159, ecapa_loss=0.0001251, whisper_loss=0.1132, over 17940.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001462, whisper_loss=0.08914, over 3827905.76 frames. ], batch size: 62, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:20:31,494 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 13:20:39,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.225e+01 2.528e+01 2.736e+01 6.244e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-15 13:20:39,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3201820.0, ans=0.07 2024-08-15 13:20:42,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3201820.0, ans=0.2 2024-08-15 13:20:43,170 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 13:20:46,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-15 13:21:34,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3202220.0, ans=0.1 2024-08-15 13:21:34,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1400, loss[loss=0.1089, beats_loss=0.01024, ecapa_loss=0.0001605, whisper_loss=0.09709, over 22482.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001456, whisper_loss=0.09, over 3831876.38 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:21:41,969 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 13:22:03,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3202420.0, ans=0.0 2024-08-15 13:22:14,334 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 13:22:20,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-15 13:22:36,944 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-15 13:22:47,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1450, loss[loss=0.09612, beats_loss=0.00935, ecapa_loss=0.0001642, whisper_loss=0.08512, over 14775.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001457, whisper_loss=0.08973, over 3803080.02 frames. ], batch size: 59, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:23:24,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.256e+01 2.495e+01 2.819e+01 4.681e+02, threshold=4.990e+01, percent-clipped=2.0 2024-08-15 13:23:24,224 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-15 13:23:24,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3202820.0, ans=0.0 2024-08-15 13:23:32,031 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 13:23:47,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3202920.0, ans=0.125 2024-08-15 13:24:03,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3203020.0, ans=0.125 2024-08-15 13:24:04,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3203020.0, ans=0.5 2024-08-15 13:24:07,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-08-15 13:24:11,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3203120.0, ans=0.125 2024-08-15 13:24:25,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1500, loss[loss=0.07494, beats_loss=0.01105, ecapa_loss=0.0001634, whisper_loss=0.06226, over 18067.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001461, whisper_loss=0.08949, over 3830783.12 frames. ], batch size: 76, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:24:28,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-15 13:24:37,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2024-08-15 13:25:08,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3203520.0, ans=0.2 2024-08-15 13:25:18,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3203520.0, ans=0.0 2024-08-15 13:25:19,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3203520.0, ans=0.125 2024-08-15 13:25:23,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3203620.0, ans=0.0 2024-08-15 13:25:38,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1550, loss[loss=0.06598, beats_loss=0.01381, ecapa_loss=0.0001501, whisper_loss=0.05067, over 21392.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.08926, over 3805811.16 frames. ], batch size: 93, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:25:51,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.256e+01 2.497e+01 2.794e+01 4.870e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-15 13:26:16,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3203920.0, ans=0.125 2024-08-15 13:26:25,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204020.0, ans=0.1 2024-08-15 13:26:40,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2024-08-15 13:26:47,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3204120.0, ans=0.0 2024-08-15 13:26:48,614 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 13:26:54,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1600, loss[loss=0.09898, beats_loss=0.0086, ecapa_loss=0.0001484, whisper_loss=0.0889, over 18272.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.08987, over 3845292.94 frames. ], batch size: 74, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:26:58,905 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 13:27:18,865 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 13:27:32,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3204420.0, ans=0.125 2024-08-15 13:27:37,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3204520.0, ans=0.2 2024-08-15 13:27:37,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3204520.0, ans=0.125 2024-08-15 13:27:44,681 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 13:27:48,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3204520.0, ans=22.5 2024-08-15 13:27:50,867 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 13:27:51,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3204520.0, ans=0.125 2024-08-15 13:27:56,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3204620.0, ans=0.125 2024-08-15 13:27:56,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204620.0, ans=0.0 2024-08-15 13:28:04,838 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 13:28:07,774 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-15 13:28:08,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1650, loss[loss=0.08772, beats_loss=0.01194, ecapa_loss=0.0001275, whisper_loss=0.0745, over 23101.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001459, whisper_loss=0.08963, over 3856055.59 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:28:12,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3204720.0, ans=0.125 2024-08-15 13:28:20,414 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 13:28:21,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.261e+01 2.464e+01 2.812e+01 1.426e+02, threshold=4.927e+01, percent-clipped=1.0 2024-08-15 13:28:42,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3204920.0, ans=0.1 2024-08-15 13:29:18,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-08-15 13:29:22,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1700, loss[loss=0.093, beats_loss=0.0109, ecapa_loss=0.0001515, whisper_loss=0.08058, over 22128.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001459, whisper_loss=0.0897, over 3839622.07 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:29:23,305 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 13:29:41,859 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-15 13:30:22,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3205620.0, ans=0.0 2024-08-15 13:30:23,639 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 13:30:27,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3205620.0, ans=0.125 2024-08-15 13:30:31,709 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 13:30:33,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3205620.0, ans=0.2 2024-08-15 13:30:38,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1750, loss[loss=0.08687, beats_loss=0.01196, ecapa_loss=0.0001426, whisper_loss=0.07348, over 17541.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001458, whisper_loss=0.08969, over 3845986.76 frames. ], batch size: 72, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:30:51,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.279e+01 2.476e+01 2.729e+01 6.838e+01, threshold=4.951e+01, percent-clipped=2.0 2024-08-15 13:30:53,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3205820.0, ans=0.1 2024-08-15 13:30:54,929 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 13:30:56,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-15 13:30:58,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3205820.0, ans=0.0 2024-08-15 13:31:01,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3205820.0, ans=0.125 2024-08-15 13:31:11,700 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 13:31:28,949 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-15 13:31:33,172 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 13:31:43,202 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-15 13:31:53,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1800, loss[loss=0.09826, beats_loss=0.01264, ecapa_loss=0.0001603, whisper_loss=0.08402, over 21181.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001463, whisper_loss=0.08953, over 3820258.00 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:31:55,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3206220.0, ans=0.125 2024-08-15 13:31:57,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-15 13:32:02,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3206220.0, ans=0.0 2024-08-15 13:32:09,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3206320.0, ans=0.2 2024-08-15 13:32:10,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-15 13:32:28,381 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 13:32:32,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3206420.0, ans=0.0 2024-08-15 13:32:37,097 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 13:32:45,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3206520.0, ans=0.0 2024-08-15 13:32:47,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-08-15 13:32:56,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3206620.0, ans=0.125 2024-08-15 13:33:06,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1850, loss[loss=0.07004, beats_loss=0.01234, ecapa_loss=0.0001396, whisper_loss=0.0563, over 17177.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001449, whisper_loss=0.08949, over 3822913.73 frames. ], batch size: 69, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:33:20,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.257e+01 2.507e+01 2.743e+01 3.719e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-15 13:33:29,134 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 36 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 13:33:40,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3206920.0, ans=0.0 2024-08-15 13:33:41,776 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-15 13:33:44,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3206920.0, ans=15.0 2024-08-15 13:33:48,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3206920.0, ans=0.125 2024-08-15 13:33:50,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3207020.0, ans=0.125 2024-08-15 13:33:59,689 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 13:34:01,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207020.0, ans=0.1 2024-08-15 13:34:03,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3207020.0, ans=0.125 2024-08-15 13:34:09,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3207120.0, ans=0.1 2024-08-15 13:34:14,916 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 13:34:21,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1900, loss[loss=0.1015, beats_loss=0.01201, ecapa_loss=0.0001399, whisper_loss=0.08812, over 21756.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001456, whisper_loss=0.08948, over 3828841.72 frames. ], batch size: 90, lr: 2.73e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 13:34:28,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2024-08-15 13:34:39,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3207320.0, ans=0.0 2024-08-15 13:35:12,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3207520.0, ans=0.0 2024-08-15 13:35:25,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-15 13:35:29,293 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 13:35:36,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1950, loss[loss=0.1072, beats_loss=0.01086, ecapa_loss=0.0001508, whisper_loss=0.09485, over 15912.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001463, whisper_loss=0.08963, over 3808008.41 frames. ], batch size: 66, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:35:40,483 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 13:35:45,970 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-15 13:35:49,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.350e+01 2.550e+01 2.908e+01 4.451e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-15 13:35:50,078 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 13:36:04,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3207820.0, ans=0.125 2024-08-15 13:36:04,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3207820.0, ans=0.05 2024-08-15 13:36:07,012 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:36:43,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-08-15 13:36:48,338 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 13:36:48,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3208120.0, ans=0.0 2024-08-15 13:36:50,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2000, loss[loss=0.09685, beats_loss=0.01226, ecapa_loss=0.0001544, whisper_loss=0.08304, over 17219.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001458, whisper_loss=0.08891, over 3812243.90 frames. ], batch size: 71, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:36:55,240 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-15 13:36:56,835 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 13:36:58,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-08-15 13:37:07,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3208320.0, ans=0.0 2024-08-15 13:37:10,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3208320.0, ans=12.0 2024-08-15 13:37:18,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3208320.0, ans=0.125 2024-08-15 13:37:21,099 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 13:37:27,486 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-15 13:37:27,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=12.0 2024-08-15 13:37:40,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-15 13:38:08,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2050, loss[loss=0.08884, beats_loss=0.01258, ecapa_loss=0.0001466, whisper_loss=0.07479, over 20968.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001457, whisper_loss=0.08938, over 3816431.34 frames. ], batch size: 88, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:38:22,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.501e+01 2.773e+01 1.854e+02, threshold=5.002e+01, percent-clipped=2.0 2024-08-15 13:38:31,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3208820.0, ans=0.125 2024-08-15 13:38:52,455 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 13:39:01,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3209020.0, ans=0.2 2024-08-15 13:39:21,492 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 13:39:22,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2100, loss[loss=0.08516, beats_loss=0.01189, ecapa_loss=0.0001359, whisper_loss=0.07191, over 22519.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001454, whisper_loss=0.08937, over 3823000.19 frames. ], batch size: 91, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:39:26,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-15 13:39:34,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209220.0, ans=0.1 2024-08-15 13:40:16,115 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 13:40:16,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209520.0, ans=0.1 2024-08-15 13:40:16,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3209520.0, ans=0.125 2024-08-15 13:40:33,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3209620.0, ans=0.0 2024-08-15 13:40:35,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-08-15 13:40:35,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2150, loss[loss=0.1013, beats_loss=0.008793, ecapa_loss=0.0001319, whisper_loss=0.09116, over 17880.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001465, whisper_loss=0.09044, over 3810778.53 frames. ], batch size: 66, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:40:36,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3209720.0, ans=0.125 2024-08-15 13:40:42,042 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 13:40:49,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.347e+01 2.631e+01 2.979e+01 4.158e+01, threshold=5.262e+01, percent-clipped=0.0 2024-08-15 13:40:53,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3209820.0, ans=0.0 2024-08-15 13:41:06,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-15 13:41:20,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3210020.0, ans=0.125 2024-08-15 13:41:32,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3210020.0, ans=0.0 2024-08-15 13:41:41,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3210120.0, ans=0.125 2024-08-15 13:41:49,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2200, loss[loss=0.1096, beats_loss=0.0116, ecapa_loss=0.00013, whisper_loss=0.09675, over 19758.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.000146, whisper_loss=0.09064, over 3821295.54 frames. ], batch size: 75, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:42:02,059 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-15 13:42:10,821 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 13:42:17,854 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-15 13:42:22,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3210420.0, ans=0.0 2024-08-15 13:42:41,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-15 13:42:42,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3210520.0, ans=0.0 2024-08-15 13:43:04,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2250, loss[loss=0.1133, beats_loss=0.009523, ecapa_loss=0.0001599, whisper_loss=0.1022, over 22198.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001453, whisper_loss=0.09023, over 3837108.49 frames. ], batch size: 92, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:43:07,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=15.0 2024-08-15 13:43:10,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-08-15 13:43:17,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.315e+01 2.592e+01 2.973e+01 1.052e+02, threshold=5.184e+01, percent-clipped=4.0 2024-08-15 13:43:35,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3210920.0, ans=0.0 2024-08-15 13:43:47,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3210920.0, ans=0.0 2024-08-15 13:43:53,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3211020.0, ans=0.0 2024-08-15 13:44:17,663 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 13:44:21,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2300, loss[loss=0.1156, beats_loss=0.007626, ecapa_loss=0.0001681, whisper_loss=0.1063, over 17062.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09078, over 3838438.61 frames. ], batch size: 68, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:44:25,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3211220.0, ans=0.125 2024-08-15 13:44:46,972 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 13:44:49,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3211320.0, ans=0.125 2024-08-15 13:44:55,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211420.0, ans=0.1 2024-08-15 13:44:55,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3211420.0, ans=0.5 2024-08-15 13:45:02,077 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-15 13:45:24,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3211520.0, ans=0.125 2024-08-15 13:45:47,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2350, loss[loss=0.08479, beats_loss=0.009574, ecapa_loss=0.0001711, whisper_loss=0.07351, over 16104.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001467, whisper_loss=0.09077, over 3821779.20 frames. ], batch size: 65, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:45:53,365 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-15 13:46:02,493 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-15 13:46:03,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.354e+01 2.614e+01 2.902e+01 1.801e+02, threshold=5.228e+01, percent-clipped=1.0 2024-08-15 13:46:04,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3211820.0, ans=0.125 2024-08-15 13:46:14,965 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0751657783985138, model_norm_threshold=52.2847900390625 2024-08-15 13:46:15,133 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.469e+04, grad_sumsq=8.469e+04, orig_rms_sq=1.000e+00 2024-08-15 13:47:04,797 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.053e+01 2024-08-15 13:47:09,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3212120.0, ans=0.125 2024-08-15 13:47:13,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2400, loss[loss=0.08621, beats_loss=0.01299, ecapa_loss=0.000172, whisper_loss=0.0715, over 19270.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001485, whisper_loss=0.09115, over 3822490.83 frames. ], batch size: 81, lr: 2.73e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:47:26,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3212220.0, ans=0.0 2024-08-15 13:47:30,784 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.090e+01 2024-08-15 13:47:52,723 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-15 13:47:54,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3212420.0, ans=0.0 2024-08-15 13:48:05,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3212520.0, ans=0.125 2024-08-15 13:48:10,308 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-15 13:48:10,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3212520.0, ans=0.125 2024-08-15 13:48:23,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212620.0, ans=0.125 2024-08-15 13:48:23,653 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:48:24,471 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-15 13:48:33,106 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 13:48:35,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2450, loss[loss=0.1029, beats_loss=0.009551, ecapa_loss=0.0001667, whisper_loss=0.09166, over 18241.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001484, whisper_loss=0.09061, over 3821027.38 frames. ], batch size: 73, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:48:41,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3212720.0, ans=0.125 2024-08-15 13:48:48,782 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 13:48:51,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.196e+01 2.471e+01 2.708e+01 6.956e+02, threshold=4.941e+01, percent-clipped=1.0 2024-08-15 13:49:11,691 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 13:49:21,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3212920.0, ans=0.125 2024-08-15 13:49:24,407 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 13:49:30,650 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 13:49:43,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3213120.0, ans=0.0 2024-08-15 13:49:57,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2500, loss[loss=0.08799, beats_loss=0.01191, ecapa_loss=0.0001264, whisper_loss=0.07481, over 16825.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001481, whisper_loss=0.09006, over 3821743.69 frames. ], batch size: 64, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:50:01,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-08-15 13:50:12,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3213220.0, ans=0.0 2024-08-15 13:50:14,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-15 13:50:26,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3213320.0, ans=0.0 2024-08-15 13:50:34,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3213420.0, ans=0.0 2024-08-15 13:50:49,909 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 13:50:53,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-08-15 13:51:10,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3213620.0, ans=0.1 2024-08-15 13:51:19,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=12.0 2024-08-15 13:51:22,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2550, loss[loss=0.07482, beats_loss=0.0128, ecapa_loss=0.0001473, whisper_loss=0.06055, over 20336.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001475, whisper_loss=0.09076, over 3846706.44 frames. ], batch size: 86, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:51:31,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3213720.0, ans=0.0 2024-08-15 13:51:31,957 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-15 13:51:38,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.247e+01 2.527e+01 2.799e+01 4.421e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 13:51:53,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3213820.0, ans=0.0 2024-08-15 13:52:18,097 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 13:52:35,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3214120.0, ans=0.125 2024-08-15 13:52:35,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2024-08-15 13:52:38,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3214120.0, ans=0.125 2024-08-15 13:52:40,115 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 13:52:51,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2600, loss[loss=0.1202, beats_loss=0.008628, ecapa_loss=0.0001389, whisper_loss=0.1102, over 22171.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0104, ecapa_loss=0.0001478, whisper_loss=0.09164, over 3865889.92 frames. ], batch size: 86, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:53:02,008 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 13:53:05,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3214220.0, ans=0.125 2024-08-15 13:53:25,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3214420.0, ans=0.125 2024-08-15 13:53:31,572 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 13:53:38,059 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-15 13:53:40,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3214420.0, ans=0.125 2024-08-15 13:53:48,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3214520.0, ans=0.05 2024-08-15 13:53:49,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-15 13:53:53,351 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 13:53:57,940 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-15 13:54:17,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2650, loss[loss=0.1104, beats_loss=0.01013, ecapa_loss=0.0001188, whisper_loss=0.09911, over 23249.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001483, whisper_loss=0.09102, over 3858745.69 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:54:28,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3214720.0, ans=0.0 2024-08-15 13:54:31,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-15 13:54:32,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.308e+01 2.516e+01 2.935e+01 7.349e+01, threshold=5.032e+01, percent-clipped=1.0 2024-08-15 13:54:34,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3214820.0, ans=0.1 2024-08-15 13:54:39,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-15 13:54:49,935 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-15 13:55:34,219 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 13:55:41,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2700, loss[loss=0.1005, beats_loss=0.008715, ecapa_loss=0.0001514, whisper_loss=0.0903, over 15210.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001485, whisper_loss=0.09073, over 3849728.59 frames. ], batch size: 58, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:55:44,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3215220.0, ans=0.1 2024-08-15 13:55:48,872 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 13:55:50,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3215220.0, ans=0.2 2024-08-15 13:55:52,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3215220.0, ans=0.125 2024-08-15 13:56:06,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=12.0 2024-08-15 13:56:15,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3215420.0, ans=0.125 2024-08-15 13:56:19,861 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 27 from Vox, 17 fro AS 2024-08-15 13:56:29,127 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-15 13:56:54,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215620.0, ans=0.1 2024-08-15 13:56:58,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3215620.0, ans=0.1 2024-08-15 13:57:00,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=12.0 2024-08-15 13:57:02,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3215620.0, ans=0.125 2024-08-15 13:57:05,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3215620.0, ans=0.1 2024-08-15 13:57:08,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2750, loss[loss=0.09973, beats_loss=0.01117, ecapa_loss=0.0001072, whisper_loss=0.08749, over 20369.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001482, whisper_loss=0.09007, over 3849725.90 frames. ], batch size: 78, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:57:11,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3215720.0, ans=0.0 2024-08-15 13:57:12,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-15 13:57:23,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.384e+01 2.723e+01 3.158e+01 5.499e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-15 13:57:24,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3215820.0, ans=0.2 2024-08-15 13:57:24,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3215820.0, ans=0.04949747468305833 2024-08-15 13:57:31,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3215820.0, ans=0.0 2024-08-15 13:57:36,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215820.0, ans=0.1 2024-08-15 13:57:36,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3215820.0, ans=0.1 2024-08-15 13:57:44,090 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 13:58:03,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3216020.0, ans=0.1 2024-08-15 13:58:35,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2800, loss[loss=0.0929, beats_loss=0.009456, ecapa_loss=0.0001582, whisper_loss=0.08186, over 15827.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.08994, over 3839284.96 frames. ], batch size: 62, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 13:58:38,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3216220.0, ans=0.1 2024-08-15 13:58:39,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-15 13:59:16,770 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 13:59:22,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3216420.0, ans=0.125 2024-08-15 13:59:29,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3216520.0, ans=0.125 2024-08-15 13:59:37,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3216520.0, ans=0.125 2024-08-15 13:59:40,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=12.0 2024-08-15 13:59:46,276 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-15 13:59:57,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3216620.0, ans=0.125 2024-08-15 14:00:02,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2850, loss[loss=0.09547, beats_loss=0.01189, ecapa_loss=0.0001284, whisper_loss=0.0823, over 19357.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.0899, over 3851216.85 frames. ], batch size: 78, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:00:19,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.378e+01 2.685e+01 2.976e+01 3.795e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-15 14:00:26,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3216820.0, ans=0.0 2024-08-15 14:00:28,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3216820.0, ans=0.125 2024-08-15 14:00:29,578 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-15 14:00:52,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3216920.0, ans=0.125 2024-08-15 14:01:04,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3217020.0, ans=0.125 2024-08-15 14:01:20,109 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 14:01:30,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2900, loss[loss=0.1119, beats_loss=0.01131, ecapa_loss=0.0001404, whisper_loss=0.0992, over 23645.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001487, whisper_loss=0.0907, over 3876762.29 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:01:36,015 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-15 14:01:39,292 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 14:01:39,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3217220.0, ans=0.125 2024-08-15 14:01:40,908 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 14:02:31,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3217520.0, ans=0.0 2024-08-15 14:02:41,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3217620.0, ans=0.1 2024-08-15 14:02:47,558 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 17 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-15 14:02:51,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2950, loss[loss=0.1435, beats_loss=0.00749, ecapa_loss=0.0001232, whisper_loss=0.1347, over 24523.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001481, whisper_loss=0.09066, over 3883744.69 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:02:57,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217720.0, ans=0.1 2024-08-15 14:03:06,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.332e+01 2.577e+01 2.863e+01 4.280e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 14:03:11,676 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 14:03:24,521 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 14:03:25,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-08-15 14:03:42,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3218020.0, ans=0.125 2024-08-15 14:03:46,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3218020.0, ans=0.125 2024-08-15 14:03:50,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-08-15 14:03:59,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.15 vs. limit=22.5 2024-08-15 14:04:01,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-15 14:04:11,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218120.0, ans=0.1 2024-08-15 14:04:16,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3000, loss[loss=0.1094, beats_loss=0.01143, ecapa_loss=0.0001607, whisper_loss=0.09641, over 18655.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001481, whisper_loss=0.09096, over 3893158.18 frames. ], batch size: 77, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:04:16,019 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 14:04:54,951 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005381, whisper_loss=0.2469, over 922467.00 frames. 2024-08-15 14:05:14,492 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004148, beats_loss=0, ecapa_loss=0.0004148, whisper_loss=0, over 939242.00 frames. 2024-08-15 14:06:18,484 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8958, 4.0725, 4.6487, 4.7846], device='cuda:3') 2024-08-15 14:07:09,255 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02341, beats_loss=0.02341, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 14:07:09,259 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 14:07:09,378 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-15 14:07:13,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3218220.0, ans=0.035 2024-08-15 14:07:29,157 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 14:07:49,552 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 14:08:01,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3218520.0, ans=0.125 2024-08-15 14:08:09,896 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 14:08:11,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218520.0, ans=0.1 2024-08-15 14:08:13,419 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 14:08:33,882 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3050, loss[loss=0.1121, beats_loss=0.008674, ecapa_loss=0.0001465, whisper_loss=0.102, over 14365.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001482, whisper_loss=0.09027, over 3894889.89 frames. ], batch size: 53, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:08:35,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3218720.0, ans=0.0 2024-08-15 14:08:38,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-08-15 14:08:46,258 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-15 14:08:51,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.307e+01 2.650e+01 2.894e+01 1.730e+02, threshold=5.300e+01, percent-clipped=1.0 2024-08-15 14:09:07,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3218820.0, ans=0.0 2024-08-15 14:09:18,994 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.903e-02 2024-08-15 14:09:34,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3219020.0, ans=0.125 2024-08-15 14:09:53,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3219120.0, ans=0.2 2024-08-15 14:10:01,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3100, loss[loss=0.09955, beats_loss=0.01001, ecapa_loss=0.0001398, whisper_loss=0.08815, over 14867.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001505, whisper_loss=0.09106, over 3860336.52 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:10:08,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3219220.0, ans=0.125 2024-08-15 14:10:19,287 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 14:10:30,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2024-08-15 14:10:40,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3219420.0, ans=0.05 2024-08-15 14:11:00,014 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-15 14:11:04,670 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 14:11:12,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3219620.0, ans=0.125 2024-08-15 14:11:22,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3219720.0, ans=0.125 2024-08-15 14:11:22,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3150, loss[loss=0.111, beats_loss=0.009339, ecapa_loss=0.0001423, whisper_loss=0.1003, over 14329.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001507, whisper_loss=0.09099, over 3822314.05 frames. ], batch size: 54, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:11:24,735 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 14:11:34,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3219720.0, ans=0.125 2024-08-15 14:11:38,436 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.271e+01 2.467e+01 2.810e+01 4.738e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-15 14:11:45,122 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-15 14:11:50,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219820.0, ans=0.1 2024-08-15 14:12:06,359 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 14:12:09,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2024-08-15 14:12:32,111 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 14:12:35,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3220120.0, ans=0.125 2024-08-15 14:12:37,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3220120.0, ans=0.2 2024-08-15 14:12:48,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3200, loss[loss=0.09887, beats_loss=0.01128, ecapa_loss=0.000123, whisper_loss=0.08636, over 16570.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.00015, whisper_loss=0.09117, over 3772283.76 frames. ], batch size: 64, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:12:58,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3220220.0, ans=0.125 2024-08-15 14:13:22,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3220420.0, ans=0.2 2024-08-15 14:13:25,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3220420.0, ans=0.125 2024-08-15 14:13:47,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3220520.0, ans=0.125 2024-08-15 14:14:05,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3220620.0, ans=0.0 2024-08-15 14:14:14,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3250, loss[loss=0.09403, beats_loss=0.009741, ecapa_loss=0.0001647, whisper_loss=0.08265, over 22635.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01051, ecapa_loss=0.0001514, whisper_loss=0.09199, over 3827798.73 frames. ], batch size: 94, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:14:30,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3220720.0, ans=0.0 2024-08-15 14:14:30,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.376e+01 2.667e+01 3.123e+01 1.417e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-15 14:14:31,022 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-15 14:14:34,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3220820.0, ans=0.2 2024-08-15 14:14:39,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3220820.0, ans=0.0 2024-08-15 14:14:44,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3220820.0, ans=0.2 2024-08-15 14:14:46,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=22.5 2024-08-15 14:14:51,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3220920.0, ans=0.125 2024-08-15 14:15:01,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3220920.0, ans=0.125 2024-08-15 14:15:05,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3221020.0, ans=0.125 2024-08-15 14:15:12,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3221020.0, ans=0.125 2024-08-15 14:15:29,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3221120.0, ans=0.09899494936611666 2024-08-15 14:15:32,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3221120.0, ans=0.0 2024-08-15 14:15:38,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3300, loss[loss=0.1138, beats_loss=0.008784, ecapa_loss=0.0001702, whisper_loss=0.1033, over 22643.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01058, ecapa_loss=0.0001513, whisper_loss=0.09114, over 3834679.02 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:15:41,685 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 14:15:48,931 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 14:16:16,486 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:16:20,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3221420.0, ans=0.0 2024-08-15 14:16:29,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3221520.0, ans=0.5 2024-08-15 14:16:43,591 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-15 14:16:48,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3221620.0, ans=0.0 2024-08-15 14:16:54,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3221620.0, ans=0.125 2024-08-15 14:16:55,644 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 14:17:01,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3221620.0, ans=0.2 2024-08-15 14:17:04,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3350, loss[loss=0.08511, beats_loss=0.009829, ecapa_loss=0.000136, whisper_loss=0.07392, over 18863.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001513, whisper_loss=0.09176, over 3862257.78 frames. ], batch size: 74, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:17:19,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.262e+01 2.579e+01 2.848e+01 8.552e+01, threshold=5.158e+01, percent-clipped=1.0 2024-08-15 14:17:35,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3221820.0, ans=0.125 2024-08-15 14:17:51,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221920.0, ans=0.1 2024-08-15 14:17:55,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3222020.0, ans=0.035 2024-08-15 14:18:12,025 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 14:18:12,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3222020.0, ans=0.0 2024-08-15 14:18:16,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.77 vs. limit=5.0 2024-08-15 14:18:18,275 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.200e-01 2024-08-15 14:18:20,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2024-08-15 14:18:23,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-15 14:18:29,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3400, loss[loss=0.08707, beats_loss=0.01187, ecapa_loss=0.0001526, whisper_loss=0.07367, over 19368.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001498, whisper_loss=0.09132, over 3903022.99 frames. ], batch size: 79, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:18:37,581 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 14:18:58,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3222320.0, ans=0.125 2024-08-15 14:19:05,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3222420.0, ans=0.0 2024-08-15 14:19:08,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3222420.0, ans=0.125 2024-08-15 14:19:51,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3450, loss[loss=0.0941, beats_loss=0.0114, ecapa_loss=0.0001268, whisper_loss=0.08143, over 17380.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001504, whisper_loss=0.09091, over 3881953.30 frames. ], batch size: 65, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:20:07,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.344e+01 2.608e+01 2.883e+01 4.857e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-15 14:20:22,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3222820.0, ans=0.2 2024-08-15 14:20:37,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222920.0, ans=0.1 2024-08-15 14:20:44,565 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-15 14:20:52,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3223020.0, ans=0.0 2024-08-15 14:20:55,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223020.0, ans=0.1 2024-08-15 14:20:56,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3223020.0, ans=0.125 2024-08-15 14:21:09,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3223120.0, ans=0.0 2024-08-15 14:21:13,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-15 14:21:17,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3500, loss[loss=0.0899, beats_loss=0.01057, ecapa_loss=0.0001687, whisper_loss=0.07765, over 19047.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001501, whisper_loss=0.09064, over 3907313.04 frames. ], batch size: 79, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:21:22,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-15 14:21:26,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2024-08-15 14:21:27,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3223220.0, ans=0.125 2024-08-15 14:22:06,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3223420.0, ans=0.0 2024-08-15 14:22:16,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-15 14:22:29,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-15 14:22:49,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3550, loss[loss=0.1146, beats_loss=0.008267, ecapa_loss=0.0001697, whisper_loss=0.1046, over 15585.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001502, whisper_loss=0.09059, over 3927786.25 frames. ], batch size: 60, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:22:56,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3223720.0, ans=0.025 2024-08-15 14:23:02,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.289e+01 2.498e+01 2.772e+01 4.287e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:23:30,969 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 14:23:33,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3223920.0, ans=0.125 2024-08-15 14:23:35,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3223920.0, ans=0.125 2024-08-15 14:23:37,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3223920.0, ans=0.125 2024-08-15 14:23:45,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-08-15 14:24:13,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224120.0, ans=0.1 2024-08-15 14:24:25,585 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3600, loss[loss=0.089, beats_loss=0.01172, ecapa_loss=0.0001254, whisper_loss=0.07603, over 21787.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.0001496, whisper_loss=0.09136, over 3931657.48 frames. ], batch size: 84, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:24:26,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.64 vs. limit=10.0 2024-08-15 14:24:37,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3224220.0, ans=0.0 2024-08-15 14:24:43,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3224220.0, ans=0.1 2024-08-15 14:25:17,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3224420.0, ans=0.125 2024-08-15 14:25:25,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3224520.0, ans=0.0 2024-08-15 14:25:46,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3224620.0, ans=0.0 2024-08-15 14:26:08,003 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 14:26:09,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3650, loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001523, whisper_loss=0.09075, over 18177.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001492, whisper_loss=0.09123, over 3883231.64 frames. ], batch size: 72, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:26:13,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3224720.0, ans=0.125 2024-08-15 14:26:29,485 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 14:26:30,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.327e+01 2.522e+01 2.934e+01 4.655e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-15 14:26:33,198 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.154e+00 2024-08-15 14:26:36,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3224820.0, ans=0.0 2024-08-15 14:27:29,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3225020.0, ans=0.2 2024-08-15 14:27:34,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3225020.0, ans=0.0 2024-08-15 14:27:34,759 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05073240399360657, model_norm_threshold=50.43817901611328 2024-08-15 14:27:34,944 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.552e+05, grad_sumsq=1.540e+07, orig_rms_sq=1.008e-02 2024-08-15 14:27:56,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3225120.0, ans=0.125 2024-08-15 14:28:01,769 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 11 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 14:28:17,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3225120.0, ans=0.0 2024-08-15 14:28:23,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3700, loss[loss=0.1095, beats_loss=0.01106, ecapa_loss=0.0001691, whisper_loss=0.09673, over 22011.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001499, whisper_loss=0.09066, over 3859246.94 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:28:23,674 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-15 14:28:32,467 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 14:28:36,961 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-15 14:29:32,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-15 14:29:38,184 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 14:30:06,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3225520.0, ans=0.0 2024-08-15 14:30:22,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3225620.0, ans=0.1 2024-08-15 14:30:30,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-15 14:30:37,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3750, loss[loss=0.1051, beats_loss=0.01102, ecapa_loss=0.000129, whisper_loss=0.09276, over 14211.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0105, ecapa_loss=0.0001498, whisper_loss=0.09128, over 3876154.14 frames. ], batch size: 54, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:31:00,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.295e+01 2.515e+01 2.786e+01 9.942e+02, threshold=5.030e+01, percent-clipped=1.0 2024-08-15 14:31:19,624 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 14:31:41,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3225920.0, ans=0.09899494936611666 2024-08-15 14:31:59,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-15 14:32:10,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3226020.0, ans=0.125 2024-08-15 14:32:38,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3800, loss[loss=0.08241, beats_loss=0.01416, ecapa_loss=0.0001108, whisper_loss=0.06714, over 20311.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001496, whisper_loss=0.09025, over 3888340.62 frames. ], batch size: 80, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:32:54,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3226320.0, ans=0.125 2024-08-15 14:32:55,924 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 14:33:03,759 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 14:33:11,554 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 14:33:35,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3226520.0, ans=0.125 2024-08-15 14:33:39,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3226520.0, ans=0.125 2024-08-15 14:33:42,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3226520.0, ans=0.125 2024-08-15 14:33:42,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3226520.0, ans=0.2 2024-08-15 14:34:00,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-08-15 14:34:07,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3226620.0, ans=0.125 2024-08-15 14:34:10,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3850, loss[loss=0.1164, beats_loss=0.00854, ecapa_loss=0.0001697, whisper_loss=0.1062, over 22680.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.000149, whisper_loss=0.09027, over 3892286.77 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:34:27,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.293e+01 2.527e+01 2.817e+01 3.723e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-15 14:34:56,167 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 14:35:06,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227020.0, ans=0.1 2024-08-15 14:35:12,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3227020.0, ans=0.125 2024-08-15 14:35:19,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227020.0, ans=0.125 2024-08-15 14:35:43,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3900, loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001637, whisper_loss=0.09084, over 15410.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001493, whisper_loss=0.09095, over 3869653.15 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:35:47,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3227220.0, ans=0.125 2024-08-15 14:36:03,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227320.0, ans=0.125 2024-08-15 14:36:03,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2024-08-15 14:36:40,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3227520.0, ans=0.05 2024-08-15 14:36:56,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3227620.0, ans=0.2 2024-08-15 14:37:01,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3227620.0, ans=0.09899494936611666 2024-08-15 14:37:04,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3227620.0, ans=0.0 2024-08-15 14:37:10,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3950, loss[loss=0.1234, beats_loss=0.009064, ecapa_loss=0.0001616, whisper_loss=0.1127, over 22932.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001509, whisper_loss=0.09104, over 3872847.03 frames. ], batch size: 90, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:37:24,633 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 14:37:26,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.465e+01 2.719e+01 3.087e+01 1.515e+02, threshold=5.437e+01, percent-clipped=3.0 2024-08-15 14:37:36,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3227820.0, ans=0.125 2024-08-15 14:37:50,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3227920.0, ans=0.95 2024-08-15 14:37:50,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-15 14:37:52,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-15 14:37:57,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3227920.0, ans=0.125 2024-08-15 14:38:03,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3228020.0, ans=0.04949747468305833 2024-08-15 14:38:21,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3228120.0, ans=0.125 2024-08-15 14:38:26,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3228120.0, ans=0.5 2024-08-15 14:38:39,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4000, loss[loss=0.1114, beats_loss=0.01206, ecapa_loss=0.0001137, whisper_loss=0.09817, over 23452.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001513, whisper_loss=0.09072, over 3884055.37 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:38:43,169 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-15 14:38:43,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-15 14:38:45,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3228220.0, ans=0.05 2024-08-15 14:38:50,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3228220.0, ans=0.035 2024-08-15 14:38:52,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3228220.0, ans=0.125 2024-08-15 14:39:23,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3228420.0, ans=0.2 2024-08-15 14:39:25,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3228420.0, ans=0.0 2024-08-15 14:39:43,084 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-15 14:39:48,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228620.0, ans=0.1 2024-08-15 14:40:05,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4050, loss[loss=0.08851, beats_loss=0.01044, ecapa_loss=0.0001436, whisper_loss=0.07663, over 17666.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001505, whisper_loss=0.09051, over 3881713.87 frames. ], batch size: 71, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:40:19,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3228720.0, ans=0.125 2024-08-15 14:40:24,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.316e+01 2.614e+01 2.943e+01 4.388e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-15 14:40:42,207 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 14:41:16,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3229020.0, ans=0.125 2024-08-15 14:41:54,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-15 14:41:58,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3229220.0, ans=0.0 2024-08-15 14:41:58,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4100, loss[loss=0.09876, beats_loss=0.008206, ecapa_loss=0.0001586, whisper_loss=0.08897, over 14504.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001496, whisper_loss=0.09086, over 3893699.82 frames. ], batch size: 55, lr: 2.72e-03, grad_scale: 1.152921504606847e+18 2024-08-15 14:42:11,341 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 36 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 14:42:11,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3229220.0, ans=0.125 2024-08-15 14:43:28,391 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 14:43:47,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3229620.0, ans=0.125 2024-08-15 14:43:50,168 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 14:44:00,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3229620.0, ans=0.0 2024-08-15 14:44:03,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3229720.0, ans=0.125 2024-08-15 14:44:04,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4150, loss[loss=0.1019, beats_loss=0.009824, ecapa_loss=0.0001585, whisper_loss=0.09046, over 15474.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01064, ecapa_loss=0.0001501, whisper_loss=0.09108, over 3922857.38 frames. ], batch size: 61, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:44:28,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.317e+01 2.580e+01 2.886e+01 4.298e+01, threshold=5.160e+01, percent-clipped=0.0 2024-08-15 14:44:35,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3229820.0, ans=0.025 2024-08-15 14:44:42,505 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 14:44:56,531 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 14:45:12,607 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 14:45:33,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4200, loss[loss=0.1057, beats_loss=0.01169, ecapa_loss=0.0001442, whisper_loss=0.09255, over 18544.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001504, whisper_loss=0.09082, over 3896686.12 frames. ], batch size: 72, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:45:35,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3230220.0, ans=0.125 2024-08-15 14:45:45,582 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-15 14:46:14,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3230420.0, ans=0.5 2024-08-15 14:46:46,183 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-15 14:46:48,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230620.0, ans=0.1 2024-08-15 14:47:02,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4250, loss[loss=0.09194, beats_loss=0.007745, ecapa_loss=0.0001979, whisper_loss=0.08222, over 14352.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001509, whisper_loss=0.0901, over 3900813.42 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:47:20,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.308e+01 2.518e+01 2.859e+01 8.550e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-15 14:47:44,609 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 14:47:52,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-08-15 14:47:58,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3231020.0, ans=0.0 2024-08-15 14:48:01,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3231020.0, ans=0.05 2024-08-15 14:48:20,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3231120.0, ans=0.0 2024-08-15 14:48:21,911 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-15 14:48:34,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4300, loss[loss=0.1024, beats_loss=0.009906, ecapa_loss=0.0001525, whisper_loss=0.09099, over 22536.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.08975, over 3878300.97 frames. ], batch size: 93, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:48:35,248 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-15 14:48:48,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3231220.0, ans=0.0 2024-08-15 14:48:50,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3231320.0, ans=0.0 2024-08-15 14:49:02,823 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.867e-01 2024-08-15 14:49:13,170 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-15 14:49:36,565 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 14:49:42,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3231620.0, ans=0.2 2024-08-15 14:49:54,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3231620.0, ans=0.125 2024-08-15 14:49:59,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4350, loss[loss=0.09312, beats_loss=0.01271, ecapa_loss=0.0001366, whisper_loss=0.07905, over 22375.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.0001512, whisper_loss=0.08937, over 3889847.27 frames. ], batch size: 92, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:50:00,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3231720.0, ans=0.125 2024-08-15 14:50:06,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3231720.0, ans=0.2 2024-08-15 14:50:17,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.376e+01 2.619e+01 2.961e+01 5.969e+01, threshold=5.237e+01, percent-clipped=2.0 2024-08-15 14:50:18,047 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 14:50:18,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3231820.0, ans=0.2 2024-08-15 14:50:50,393 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 14:50:51,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3231920.0, ans=0.05 2024-08-15 14:50:57,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3232020.0, ans=10.0 2024-08-15 14:51:28,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4400, loss[loss=0.09789, beats_loss=0.009915, ecapa_loss=0.0001621, whisper_loss=0.08636, over 14790.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01073, ecapa_loss=0.0001494, whisper_loss=0.08878, over 3887058.46 frames. ], batch size: 59, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:51:45,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3232320.0, ans=0.125 2024-08-15 14:51:50,369 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-15 14:52:38,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3232620.0, ans=0.0 2024-08-15 14:52:47,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3232620.0, ans=0.125 2024-08-15 14:52:51,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4450, loss[loss=0.1315, beats_loss=0.005653, ecapa_loss=0.0001969, whisper_loss=0.1239, over 20575.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001491, whisper_loss=0.08939, over 3873058.17 frames. ], batch size: 83, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:53:03,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3232720.0, ans=0.2 2024-08-15 14:53:08,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.314e+01 2.575e+01 2.815e+01 3.995e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-15 14:53:17,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3232820.0, ans=0.0 2024-08-15 14:53:17,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3232820.0, ans=0.125 2024-08-15 14:53:23,434 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 14:53:41,008 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-15 14:54:10,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3233120.0, ans=0.0 2024-08-15 14:54:15,405 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-15 14:54:21,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3233120.0, ans=0.05 2024-08-15 14:54:24,024 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-15 14:54:25,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4500, loss[loss=0.1156, beats_loss=0.01079, ecapa_loss=0.0001421, whisper_loss=0.1034, over 18637.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01068, ecapa_loss=0.0001497, whisper_loss=0.08907, over 3897807.08 frames. ], batch size: 73, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:54:30,243 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 13 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 14:54:35,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3233220.0, ans=0.125 2024-08-15 14:54:59,269 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 36 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 14:55:07,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3233420.0, ans=0.0 2024-08-15 14:55:25,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3233520.0, ans=0.1 2024-08-15 14:55:38,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3233620.0, ans=0.125 2024-08-15 14:55:44,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3233620.0, ans=0.125 2024-08-15 14:55:51,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4550, loss[loss=0.1021, beats_loss=0.01077, ecapa_loss=0.0001264, whisper_loss=0.09006, over 18787.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001498, whisper_loss=0.08938, over 3913428.44 frames. ], batch size: 73, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:55:56,680 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 14:56:07,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.406e+01 2.642e+01 2.962e+01 1.202e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-15 14:56:22,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3233820.0, ans=0.04949747468305833 2024-08-15 14:56:49,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-15 14:57:06,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3234120.0, ans=0.125 2024-08-15 14:57:16,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3234120.0, ans=0.125 2024-08-15 14:57:18,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4600, loss[loss=0.105, beats_loss=0.01101, ecapa_loss=0.0001309, whisper_loss=0.09265, over 22467.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001494, whisper_loss=0.08943, over 3915530.12 frames. ], batch size: 88, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:57:32,109 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 14:57:35,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3234320.0, ans=0.2 2024-08-15 14:57:46,856 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 14:58:03,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234420.0, ans=0.1 2024-08-15 14:58:03,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3234420.0, ans=0.2 2024-08-15 14:58:16,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3234520.0, ans=0.0 2024-08-15 14:58:40,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4650, loss[loss=0.1049, beats_loss=0.008827, ecapa_loss=0.0001228, whisper_loss=0.09487, over 15741.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01077, ecapa_loss=0.0001498, whisper_loss=0.0892, over 3900371.88 frames. ], batch size: 57, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 14:58:51,736 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 14:58:56,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.314e+01 2.498e+01 2.884e+01 4.685e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-15 14:59:04,984 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 14:59:08,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3234820.0, ans=0.125 2024-08-15 14:59:14,943 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 14:59:25,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-15 14:59:26,481 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-15 14:59:27,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-15 14:59:28,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3235020.0, ans=0.0 2024-08-15 14:59:38,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3235020.0, ans=0.125 2024-08-15 14:59:54,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3235120.0, ans=0.0 2024-08-15 14:59:57,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.01 vs. limit=22.5 2024-08-15 15:00:08,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4700, loss[loss=0.116, beats_loss=0.009654, ecapa_loss=0.0001847, whisper_loss=0.1045, over 22400.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01072, ecapa_loss=0.0001498, whisper_loss=0.08929, over 3892961.78 frames. ], batch size: 91, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:00:42,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3235420.0, ans=0.125 2024-08-15 15:00:53,874 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:01:06,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3235520.0, ans=0.0 2024-08-15 15:01:13,974 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 15:01:14,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3235520.0, ans=0.125 2024-08-15 15:01:21,007 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.341e-02 2024-08-15 15:01:28,070 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 15:01:33,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=22.5 2024-08-15 15:01:33,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4750, loss[loss=0.09789, beats_loss=0.009531, ecapa_loss=0.0001382, whisper_loss=0.08698, over 20389.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.000149, whisper_loss=0.08951, over 3903096.81 frames. ], batch size: 81, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:01:33,653 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 15:01:41,204 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 15:01:45,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3235720.0, ans=0.95 2024-08-15 15:01:49,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.242e+01 2.450e+01 2.790e+01 3.790e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-15 15:02:24,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236020.0, ans=0.1 2024-08-15 15:02:39,362 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 15:02:39,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3236120.0, ans=0.1 2024-08-15 15:02:51,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4800, loss[loss=0.09808, beats_loss=0.01273, ecapa_loss=0.000136, whisper_loss=0.08399, over 21790.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0108, ecapa_loss=0.0001491, whisper_loss=0.08829, over 3884902.12 frames. ], batch size: 87, lr: 2.72e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:02:58,375 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-15 15:03:15,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3236320.0, ans=0.05 2024-08-15 15:03:22,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3236420.0, ans=0.0 2024-08-15 15:03:30,830 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 15:03:39,020 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 15:03:44,821 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.629e-03 2024-08-15 15:03:55,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-15 15:03:58,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3236620.0, ans=0.125 2024-08-15 15:04:05,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3236620.0, ans=0.125 2024-08-15 15:04:09,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4850, loss[loss=0.09589, beats_loss=0.01352, ecapa_loss=0.000101, whisper_loss=0.08136, over 22627.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01077, ecapa_loss=0.000149, whisper_loss=0.08905, over 3900230.62 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:04:15,057 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-15 15:04:24,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.428e+01 2.638e+01 3.060e+01 4.898e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-15 15:04:32,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3236820.0, ans=0.0 2024-08-15 15:04:41,428 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-15 15:04:41,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236920.0, ans=0.1 2024-08-15 15:04:42,741 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 15:04:46,889 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 15:04:54,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3237020.0, ans=0.0 2024-08-15 15:05:12,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3237120.0, ans=0.125 2024-08-15 15:05:15,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3237120.0, ans=0.125 2024-08-15 15:05:17,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3237120.0, ans=0.125 2024-08-15 15:05:21,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4900, loss[loss=0.1176, beats_loss=0.008077, ecapa_loss=0.0001869, whisper_loss=0.1077, over 21477.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001488, whisper_loss=0.09005, over 3895375.26 frames. ], batch size: 86, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:05:22,293 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-15 15:05:25,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-15 15:05:33,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=12.0 2024-08-15 15:05:36,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3237320.0, ans=0.125 2024-08-15 15:05:44,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3237320.0, ans=10.0 2024-08-15 15:05:51,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3237420.0, ans=0.05 2024-08-15 15:05:51,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3237420.0, ans=0.1 2024-08-15 15:05:58,480 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-15 15:06:31,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4950, loss[loss=0.08499, beats_loss=0.01369, ecapa_loss=0.0001185, whisper_loss=0.07012, over 22098.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01068, ecapa_loss=0.000149, whisper_loss=0.08918, over 3883990.44 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:06:43,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3237720.0, ans=0.0 2024-08-15 15:06:45,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.352e+01 2.615e+01 2.945e+01 2.370e+02, threshold=5.229e+01, percent-clipped=2.0 2024-08-15 15:06:47,108 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 15:06:48,354 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 15:06:50,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3237820.0, ans=0.2 2024-08-15 15:07:10,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3237920.0, ans=0.0 2024-08-15 15:07:31,072 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 15:07:35,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3238120.0, ans=0.125 2024-08-15 15:07:38,237 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 15:07:40,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5000, loss[loss=0.1076, beats_loss=0.01245, ecapa_loss=0.0001424, whisper_loss=0.09371, over 22672.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.000149, whisper_loss=0.08935, over 3856445.47 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:07:42,218 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 15:07:49,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3238220.0, ans=0.125 2024-08-15 15:07:51,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3238220.0, ans=0.0 2024-08-15 15:07:59,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-15 15:08:09,822 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 15:08:11,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3238420.0, ans=0.1 2024-08-15 15:08:21,510 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 15:08:23,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3238520.0, ans=0.125 2024-08-15 15:08:24,024 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 15:08:31,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-15 15:08:48,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5050, loss[loss=0.08677, beats_loss=0.01304, ecapa_loss=0.0001323, whisper_loss=0.0724, over 18688.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001487, whisper_loss=0.09037, over 3902760.41 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:08:57,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-15 15:08:58,183 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-15 15:09:02,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.271e+01 2.496e+01 2.876e+01 1.159e+02, threshold=4.993e+01, percent-clipped=2.0 2024-08-15 15:09:03,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2024-08-15 15:09:07,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3238820.0, ans=0.125 2024-08-15 15:09:26,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3238920.0, ans=0.2 2024-08-15 15:09:28,408 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 15:09:38,130 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 15:09:38,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3239020.0, ans=0.125 2024-08-15 15:09:39,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3239020.0, ans=0.07 2024-08-15 15:09:55,794 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5100, loss[loss=0.1233, beats_loss=0.009223, ecapa_loss=0.0001428, whisper_loss=0.1127, over 22743.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01069, ecapa_loss=0.0001499, whisper_loss=0.09049, over 3917170.78 frames. ], batch size: 88, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:09:57,562 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 15:10:25,630 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-15 15:10:29,439 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-15 15:10:29,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3239420.0, ans=0.1 2024-08-15 15:10:30,749 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-15 15:10:46,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239520.0, ans=0.1 2024-08-15 15:10:47,194 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-15 15:10:49,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3239620.0, ans=0.04949747468305833 2024-08-15 15:10:58,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3239620.0, ans=6.0 2024-08-15 15:11:03,286 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5150, loss[loss=0.09957, beats_loss=0.008723, ecapa_loss=0.0001541, whisper_loss=0.0893, over 14348.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01073, ecapa_loss=0.0001496, whisper_loss=0.09072, over 3929684.08 frames. ], batch size: 55, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:11:16,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.367e+01 2.663e+01 3.034e+01 8.372e+01, threshold=5.326e+01, percent-clipped=1.0 2024-08-15 15:11:37,514 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 15:11:39,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3239920.0, ans=0.125 2024-08-15 15:11:49,670 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-15 15:12:04,301 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-15 15:12:09,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-08-15 15:12:15,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5200, loss[loss=0.08181, beats_loss=0.0128, ecapa_loss=0.0001742, whisper_loss=0.06727, over 21399.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01082, ecapa_loss=0.000149, whisper_loss=0.09027, over 3937762.09 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:12:15,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3240220.0, ans=0.0 2024-08-15 15:12:21,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240220.0, ans=0.1 2024-08-15 15:12:25,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3240220.0, ans=0.125 2024-08-15 15:12:48,415 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 15:13:05,493 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 15:13:14,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-15 15:13:26,221 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5250, loss[loss=0.102, beats_loss=0.01109, ecapa_loss=0.0001774, whisper_loss=0.08909, over 21241.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01082, ecapa_loss=0.0001488, whisper_loss=0.08922, over 3891141.43 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:13:29,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-15 15:13:40,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.275e+01 2.578e+01 2.785e+01 8.879e+01, threshold=5.156e+01, percent-clipped=2.0 2024-08-15 15:13:50,997 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 15:13:54,216 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 15:14:00,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3240920.0, ans=0.2 2024-08-15 15:14:01,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-15 15:14:03,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3240920.0, ans=0.1 2024-08-15 15:14:04,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3240920.0, ans=6.0 2024-08-15 15:14:14,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3241020.0, ans=0.125 2024-08-15 15:14:21,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3241020.0, ans=0.125 2024-08-15 15:14:22,828 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 15:14:24,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-15 15:14:26,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.17 vs. limit=6.0 2024-08-15 15:14:26,748 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 15:14:34,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3241120.0, ans=0.125 2024-08-15 15:14:35,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3241120.0, ans=0.125 2024-08-15 15:14:37,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5300, loss[loss=0.1031, beats_loss=0.01163, ecapa_loss=0.0001468, whisper_loss=0.08998, over 22603.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01084, ecapa_loss=0.0001481, whisper_loss=0.0891, over 3903197.50 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:14:44,648 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-15 15:14:53,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3241320.0, ans=15.0 2024-08-15 15:15:06,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3241420.0, ans=0.125 2024-08-15 15:15:06,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3241420.0, ans=0.0 2024-08-15 15:15:11,596 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 15:15:14,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3241420.0, ans=0.125 2024-08-15 15:15:27,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3241520.0, ans=0.125 2024-08-15 15:15:39,627 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 15:15:47,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5350, loss[loss=0.1023, beats_loss=0.009512, ecapa_loss=0.0001461, whisper_loss=0.09131, over 15819.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01075, ecapa_loss=0.0001488, whisper_loss=0.08956, over 3891385.02 frames. ], batch size: 61, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:15:48,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-08-15 15:15:52,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3241720.0, ans=0.0 2024-08-15 15:16:01,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.335e+01 2.708e+01 3.077e+01 2.135e+02, threshold=5.416e+01, percent-clipped=3.0 2024-08-15 15:16:23,675 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 15:16:25,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3241920.0, ans=0.2 2024-08-15 15:16:37,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3242020.0, ans=10.0 2024-08-15 15:16:56,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5400, loss[loss=0.08726, beats_loss=0.01034, ecapa_loss=0.0001369, whisper_loss=0.07555, over 16160.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000149, whisper_loss=0.09021, over 3866939.22 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:17:03,578 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08951901644468307, model_norm_threshold=54.15595626831055 2024-08-15 15:17:03,752 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.765e+04, grad_sumsq=4.730e+06, orig_rms_sq=1.007e-02 2024-08-15 15:17:07,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-15 15:17:08,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3242220.0, ans=0.125 2024-08-15 15:17:15,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3242320.0, ans=0.0 2024-08-15 15:17:20,580 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-15 15:17:26,473 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 15:17:38,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3242520.0, ans=0.125 2024-08-15 15:17:47,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242520.0, ans=0.1 2024-08-15 15:18:07,657 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5450, loss[loss=0.09583, beats_loss=0.01266, ecapa_loss=9.525e-05, whisper_loss=0.08222, over 16888.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.0001485, whisper_loss=0.09032, over 3883398.27 frames. ], batch size: 63, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:18:17,511 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.753e+05 2024-08-15 15:18:22,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.261e+01 2.541e+01 2.887e+01 6.050e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-15 15:18:25,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3242820.0, ans=0.0 2024-08-15 15:18:58,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2024-08-15 15:19:01,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3243020.0, ans=0.1 2024-08-15 15:19:04,495 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 15:19:09,316 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-15 15:19:23,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3243120.0, ans=0.1 2024-08-15 15:19:24,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-15 15:19:26,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5500, loss[loss=0.1011, beats_loss=0.01102, ecapa_loss=0.0001412, whisper_loss=0.08864, over 23197.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.0001492, whisper_loss=0.09044, over 3874106.82 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:19:38,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-15 15:19:54,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3243320.0, ans=0.125 2024-08-15 15:20:16,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3243520.0, ans=0.125 2024-08-15 15:20:24,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2024-08-15 15:20:49,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5550, loss[loss=0.1062, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.0941, over 21973.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001494, whisper_loss=0.09022, over 3884212.88 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:21:06,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.298e+01 2.585e+01 2.775e+01 4.176e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-15 15:21:09,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3243820.0, ans=0.0 2024-08-15 15:21:16,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 15:22:01,039 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 15:22:02,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2024-08-15 15:22:09,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3244120.0, ans=0.125 2024-08-15 15:22:12,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3244220.0, ans=0.0 2024-08-15 15:22:13,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5600, loss[loss=0.1051, beats_loss=0.009512, ecapa_loss=0.0001818, whisper_loss=0.09376, over 22166.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001489, whisper_loss=0.09053, over 3897389.09 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:22:42,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3244320.0, ans=0.125 2024-08-15 15:23:18,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3244620.0, ans=0.125 2024-08-15 15:23:34,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3244720.0, ans=0.125 2024-08-15 15:23:35,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5650, loss[loss=0.115, beats_loss=0.009225, ecapa_loss=0.0001219, whisper_loss=0.1046, over 21849.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001492, whisper_loss=0.09008, over 3905128.56 frames. ], batch size: 83, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:23:48,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3244720.0, ans=0.125 2024-08-15 15:23:50,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-15 15:23:55,554 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.317e+01 2.489e+01 2.780e+01 3.847e+01, threshold=4.978e+01, percent-clipped=0.0 2024-08-15 15:24:09,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=22.5 2024-08-15 15:24:09,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3244820.0, ans=0.035 2024-08-15 15:24:12,711 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-15 15:24:49,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3245120.0, ans=0.125 2024-08-15 15:24:53,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3245120.0, ans=0.1 2024-08-15 15:25:01,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5700, loss[loss=0.1132, beats_loss=0.01199, ecapa_loss=0.0001262, whisper_loss=0.09993, over 18577.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01075, ecapa_loss=0.0001496, whisper_loss=0.08987, over 3938644.75 frames. ], batch size: 73, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:25:41,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3245420.0, ans=0.125 2024-08-15 15:25:44,370 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-15 15:25:49,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3245420.0, ans=0.125 2024-08-15 15:25:58,396 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 15:26:04,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3245520.0, ans=0.5 2024-08-15 15:26:28,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5750, loss[loss=0.09171, beats_loss=0.009024, ecapa_loss=0.0001897, whisper_loss=0.08079, over 17801.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001503, whisper_loss=0.0904, over 3945371.42 frames. ], batch size: 72, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:26:43,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245720.0, ans=0.1 2024-08-15 15:26:46,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.388e+01 2.608e+01 2.971e+01 1.987e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-15 15:26:51,648 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 15:26:55,801 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 15:26:58,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3245820.0, ans=0.0 2024-08-15 15:27:16,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3245920.0, ans=0.125 2024-08-15 15:27:25,810 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 15:27:26,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2024-08-15 15:27:33,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3246020.0, ans=0.125 2024-08-15 15:27:43,663 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 15:27:45,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3246120.0, ans=0.125 2024-08-15 15:27:47,731 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 15:27:53,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5800, loss[loss=0.08377, beats_loss=0.0129, ecapa_loss=0.0001383, whisper_loss=0.06948, over 17269.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01069, ecapa_loss=0.00015, whisper_loss=0.09011, over 3892338.35 frames. ], batch size: 69, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:28:05,031 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-15 15:29:02,768 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 15:29:11,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5850, loss[loss=0.1052, beats_loss=0.01228, ecapa_loss=0.0001306, whisper_loss=0.09158, over 22484.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.000149, whisper_loss=0.08988, over 3922755.04 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:29:16,757 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 15:29:26,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.269e+01 2.533e+01 2.888e+01 4.930e+01, threshold=5.067e+01, percent-clipped=0.0 2024-08-15 15:29:26,648 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 15:29:36,869 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 15:29:41,328 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 15:29:44,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3246920.0, ans=0.125 2024-08-15 15:29:54,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3247020.0, ans=0.2 2024-08-15 15:29:56,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3247020.0, ans=0.04949747468305833 2024-08-15 15:30:25,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2024-08-15 15:30:25,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5900, loss[loss=0.08375, beats_loss=0.01331, ecapa_loss=0.0001329, whisper_loss=0.06911, over 20848.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01076, ecapa_loss=0.0001489, whisper_loss=0.08928, over 3876627.39 frames. ], batch size: 82, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:30:51,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3247320.0, ans=0.1 2024-08-15 15:30:57,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3247420.0, ans=0.0 2024-08-15 15:31:28,674 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 15:31:30,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-15 15:31:32,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-08-15 15:31:43,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5950, loss[loss=0.1042, beats_loss=0.008282, ecapa_loss=0.0001334, whisper_loss=0.09457, over 15392.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001489, whisper_loss=0.09024, over 3922874.58 frames. ], batch size: 57, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:31:46,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3247720.0, ans=0.125 2024-08-15 15:31:51,920 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:31:58,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.298e+01 2.552e+01 2.863e+01 3.856e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-15 15:32:12,045 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-15 15:32:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3247920.0, ans=0.125 2024-08-15 15:32:18,987 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-15 15:32:22,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247920.0, ans=0.1 2024-08-15 15:32:54,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6000, loss[loss=0.102, beats_loss=0.01103, ecapa_loss=0.0001626, whisper_loss=0.08932, over 22170.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01075, ecapa_loss=0.0001485, whisper_loss=0.09035, over 3918916.85 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:32:54,466 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 15:33:33,412 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005302, whisper_loss=0.2464, over 922467.00 frames. 2024-08-15 15:33:54,123 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004186, beats_loss=0, ecapa_loss=0.0004186, whisper_loss=0, over 939242.00 frames. 2024-08-15 15:35:52,485 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02334, beats_loss=0.02334, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 15:35:52,494 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 15:35:57,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3248220.0, ans=0.125 2024-08-15 15:36:00,886 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 15:36:12,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3248320.0, ans=0.125 2024-08-15 15:36:19,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3248420.0, ans=0.125 2024-08-15 15:36:23,674 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 15:36:24,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3248420.0, ans=0.95 2024-08-15 15:36:28,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3248420.0, ans=0.0 2024-08-15 15:36:30,974 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-15 15:36:34,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248520.0, ans=0.1 2024-08-15 15:36:41,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.98 vs. limit=15.0 2024-08-15 15:36:50,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3248620.0, ans=0.125 2024-08-15 15:36:50,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3248620.0, ans=0.125 2024-08-15 15:36:52,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-08-15 15:36:57,339 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 15:37:02,507 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6050, loss[loss=0.09535, beats_loss=0.01303, ecapa_loss=0.000135, whisper_loss=0.08097, over 16553.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01071, ecapa_loss=0.0001494, whisper_loss=0.09088, over 3916094.20 frames. ], batch size: 67, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:37:09,646 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-15 15:37:16,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.402e+01 2.591e+01 2.977e+01 8.754e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-15 15:37:23,862 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-15 15:37:25,610 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.043e+00 2024-08-15 15:37:42,565 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-15 15:37:42,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3248920.0, ans=0.125 2024-08-15 15:37:44,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3249020.0, ans=0.125 2024-08-15 15:38:12,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2024-08-15 15:38:12,788 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6100, loss[loss=0.09968, beats_loss=0.01147, ecapa_loss=0.000105, whisper_loss=0.08716, over 18681.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.000151, whisper_loss=0.09044, over 3919851.96 frames. ], batch size: 69, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:38:25,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3249220.0, ans=0.125 2024-08-15 15:38:30,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3249320.0, ans=0.125 2024-08-15 15:38:31,322 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 15:38:34,208 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-15 15:38:41,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2024-08-15 15:38:55,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-08-15 15:38:58,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2024-08-15 15:39:00,713 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 15:39:12,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3249620.0, ans=0.2 2024-08-15 15:39:22,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6150, loss[loss=0.119, beats_loss=0.009046, ecapa_loss=0.0001288, whisper_loss=0.1087, over 24209.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001514, whisper_loss=0.09094, over 3903512.57 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:39:33,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2024-08-15 15:39:37,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.219e+01 2.437e+01 2.676e+01 4.381e+01, threshold=4.874e+01, percent-clipped=0.0 2024-08-15 15:39:48,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=15.0 2024-08-15 15:40:05,718 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 15:40:14,285 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 15:40:30,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-15 15:40:34,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6200, loss[loss=0.05911, beats_loss=0.01583, ecapa_loss=0.0001013, whisper_loss=0.04227, over 14379.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001499, whisper_loss=0.09054, over 3873100.30 frames. ], batch size: 58, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:40:36,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3250220.0, ans=0.125 2024-08-15 15:40:36,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-15 15:40:42,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3250220.0, ans=0.2 2024-08-15 15:40:55,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3250320.0, ans=0.125 2024-08-15 15:41:46,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6250, loss[loss=0.08687, beats_loss=0.01207, ecapa_loss=0.0001276, whisper_loss=0.07352, over 21362.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001495, whisper_loss=0.09036, over 3906935.77 frames. ], batch size: 85, lr: 2.71e-03, grad_scale: 1.152921504606847e+18 2024-08-15 15:41:46,732 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 15:41:48,173 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-15 15:41:58,027 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-15 15:42:01,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.383e+01 2.624e+01 2.920e+01 1.622e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-15 15:42:06,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3250820.0, ans=0.05 2024-08-15 15:42:19,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3250920.0, ans=0.125 2024-08-15 15:42:20,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3250920.0, ans=0.04949747468305833 2024-08-15 15:42:22,010 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-15 15:42:24,721 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-15 15:42:25,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3250920.0, ans=0.125 2024-08-15 15:42:27,489 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-15 15:42:41,793 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.669e+00 2024-08-15 15:42:42,709 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 15:42:44,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3251120.0, ans=0.0 2024-08-15 15:42:56,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6300, loss[loss=0.09971, beats_loss=0.009917, ecapa_loss=0.0001762, whisper_loss=0.08803, over 22057.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001505, whisper_loss=0.09045, over 3906146.18 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:43:09,441 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 36 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-15 15:43:38,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3251520.0, ans=0.1 2024-08-15 15:44:00,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3251620.0, ans=0.0 2024-08-15 15:44:06,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6350, loss[loss=0.09625, beats_loss=0.01267, ecapa_loss=0.0001488, whisper_loss=0.08209, over 23787.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001497, whisper_loss=0.09094, over 3926335.85 frames. ], batch size: 93, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:44:13,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2024-08-15 15:44:19,412 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-15 15:44:22,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.294e+01 2.523e+01 2.815e+01 3.585e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-15 15:44:26,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3251820.0, ans=0.2 2024-08-15 15:44:36,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-08-15 15:44:46,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-15 15:44:49,789 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-15 15:44:54,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3252020.0, ans=0.125 2024-08-15 15:45:02,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3252120.0, ans=0.125 2024-08-15 15:45:13,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3252120.0, ans=0.125 2024-08-15 15:45:17,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6400, loss[loss=0.09227, beats_loss=0.01112, ecapa_loss=0.0001375, whisper_loss=0.07978, over 17801.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01071, ecapa_loss=0.0001493, whisper_loss=0.09102, over 3944359.87 frames. ], batch size: 70, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:45:18,121 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-15 15:45:36,036 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 40 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 15:45:55,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3252420.0, ans=0.125 2024-08-15 15:45:56,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-15 15:45:59,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3252520.0, ans=0.125 2024-08-15 15:45:59,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3252520.0, ans=0.0 2024-08-15 15:46:02,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=12.0 2024-08-15 15:46:27,826 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6450, loss[loss=0.1055, beats_loss=0.009503, ecapa_loss=0.0001517, whisper_loss=0.09443, over 23566.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0107, ecapa_loss=0.0001501, whisper_loss=0.09129, over 3975331.20 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:46:31,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3252720.0, ans=0.0 2024-08-15 15:46:42,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.348e+01 2.696e+01 2.907e+01 4.718e+01, threshold=5.393e+01, percent-clipped=0.0 2024-08-15 15:47:21,309 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 40 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 15:47:27,137 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-15 15:47:41,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3253220.0, ans=0.125 2024-08-15 15:47:41,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6500, loss[loss=0.08128, beats_loss=0.01436, ecapa_loss=0.0001408, whisper_loss=0.06552, over 18269.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001496, whisper_loss=0.09155, over 3964900.35 frames. ], batch size: 77, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:47:49,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=12.0 2024-08-15 15:47:50,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3253220.0, ans=0.0 2024-08-15 15:47:56,661 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 15:47:57,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3253320.0, ans=0.2 2024-08-15 15:48:27,783 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07086287438869476, model_norm_threshold=53.929649353027344 2024-08-15 15:48:27,952 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.590e+04, grad_sumsq=8.502e+06, orig_rms_sq=1.010e-02 2024-08-15 15:48:55,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3253720.0, ans=0.125 2024-08-15 15:48:56,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6550, loss[loss=0.09737, beats_loss=0.01023, ecapa_loss=0.0001891, whisper_loss=0.08525, over 21469.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01067, ecapa_loss=0.0001497, whisper_loss=0.09164, over 3967175.13 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:49:03,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3253720.0, ans=0.2 2024-08-15 15:49:07,818 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 15:49:11,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.371e+01 2.638e+01 2.935e+01 7.610e+02, threshold=5.275e+01, percent-clipped=2.0 2024-08-15 15:49:14,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2024-08-15 15:49:29,847 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-15 15:49:42,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3254020.0, ans=0.1 2024-08-15 15:49:42,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3254020.0, ans=0.125 2024-08-15 15:49:54,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-15 15:50:01,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3254120.0, ans=0.2 2024-08-15 15:50:07,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6600, loss[loss=0.127, beats_loss=0.009548, ecapa_loss=0.0001574, whisper_loss=0.1159, over 22468.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01062, ecapa_loss=0.0001509, whisper_loss=0.09188, over 3977304.52 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:50:22,571 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 15:50:27,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3254320.0, ans=0.0 2024-08-15 15:50:36,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3254420.0, ans=0.2 2024-08-15 15:50:38,665 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-15 15:50:47,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3254420.0, ans=0.1 2024-08-15 15:50:58,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3254520.0, ans=0.125 2024-08-15 15:51:07,875 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-15 15:51:09,275 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-15 15:51:09,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3254620.0, ans=0.07 2024-08-15 15:51:19,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6650, loss[loss=0.1148, beats_loss=0.009889, ecapa_loss=0.000155, whisper_loss=0.1034, over 23547.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01064, ecapa_loss=0.0001511, whisper_loss=0.09164, over 3970682.74 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:51:24,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3254720.0, ans=0.0 2024-08-15 15:51:28,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3254720.0, ans=0.0 2024-08-15 15:51:35,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.370e+01 2.592e+01 2.847e+01 4.238e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-15 15:51:39,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3254820.0, ans=0.1 2024-08-15 15:51:50,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-15 15:51:52,400 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-15 15:51:54,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3254920.0, ans=0.125 2024-08-15 15:52:13,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3255020.0, ans=0.125 2024-08-15 15:52:16,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3255120.0, ans=0.125 2024-08-15 15:52:33,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6700, loss[loss=0.1058, beats_loss=0.01315, ecapa_loss=0.0001118, whisper_loss=0.09155, over 23682.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001503, whisper_loss=0.09158, over 3942196.99 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:52:55,154 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-15 15:53:17,613 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 15:53:20,559 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-15 15:53:27,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3255520.0, ans=10.0 2024-08-15 15:53:31,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3255620.0, ans=6.0 2024-08-15 15:53:32,254 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-15 15:53:42,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3255620.0, ans=0.1 2024-08-15 15:53:45,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6750, loss[loss=0.1061, beats_loss=0.01011, ecapa_loss=0.0001527, whisper_loss=0.09444, over 22501.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001509, whisper_loss=0.09109, over 3926725.19 frames. ], batch size: 92, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:53:53,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3255720.0, ans=0.2 2024-08-15 15:54:01,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.293e+01 2.545e+01 2.878e+01 4.170e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-15 15:54:05,836 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 15:54:06,930 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-15 15:54:37,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3256020.0, ans=0.2 2024-08-15 15:54:46,681 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 15:54:50,884 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 15:54:51,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3256120.0, ans=0.125 2024-08-15 15:54:56,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6800, loss[loss=0.09362, beats_loss=0.01207, ecapa_loss=0.0001841, whisper_loss=0.07971, over 19207.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001513, whisper_loss=0.09147, over 3944458.78 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:54:57,950 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 35 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-15 15:55:04,807 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-15 15:55:17,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-15 15:55:18,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2024-08-15 15:55:25,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2024-08-15 15:55:48,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3256520.0, ans=0.1 2024-08-15 15:55:58,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-15 15:56:06,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6850, loss[loss=0.1049, beats_loss=0.008918, ecapa_loss=0.0001262, whisper_loss=0.09476, over 14958.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001513, whisper_loss=0.091, over 3929540.11 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:56:07,884 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-15 15:56:08,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:56:09,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3256720.0, ans=0.0 2024-08-15 15:56:18,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-15 15:56:19,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3256720.0, ans=0.95 2024-08-15 15:56:22,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.268e+01 2.467e+01 2.871e+01 7.953e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-15 15:56:23,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3256820.0, ans=0.125 2024-08-15 15:56:23,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-08-15 15:56:26,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3256820.0, ans=0.0 2024-08-15 15:56:26,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-15 15:56:31,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3256820.0, ans=0.125 2024-08-15 15:56:35,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3256920.0, ans=0.125 2024-08-15 15:56:47,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=22.5 2024-08-15 15:56:52,285 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 15:57:05,772 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-15 15:57:09,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3257120.0, ans=0.125 2024-08-15 15:57:11,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-08-15 15:57:14,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3257120.0, ans=0.125 2024-08-15 15:57:14,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3257120.0, ans=0.2 2024-08-15 15:57:20,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6900, loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001544, whisper_loss=0.09151, over 19693.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0105, ecapa_loss=0.000151, whisper_loss=0.09191, over 3912791.65 frames. ], batch size: 80, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:57:26,386 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 15:57:47,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3257320.0, ans=0.125 2024-08-15 15:57:48,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3257320.0, ans=0.0 2024-08-15 15:57:56,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3257420.0, ans=0.125 2024-08-15 15:57:59,323 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 15:58:02,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3257420.0, ans=0.125 2024-08-15 15:58:08,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-15 15:58:10,535 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 15:58:13,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3257520.0, ans=0.07 2024-08-15 15:58:34,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6950, loss[loss=0.08577, beats_loss=0.01306, ecapa_loss=0.0001104, whisper_loss=0.0716, over 16501.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01063, ecapa_loss=0.0001496, whisper_loss=0.09192, over 3906291.52 frames. ], batch size: 64, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:58:38,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3257720.0, ans=0.0 2024-08-15 15:58:49,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.345e+01 2.623e+01 2.937e+01 1.105e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-15 15:58:57,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3257820.0, ans=0.125 2024-08-15 15:59:03,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3257920.0, ans=15.0 2024-08-15 15:59:20,969 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-15 15:59:25,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3258020.0, ans=0.2 2024-08-15 15:59:28,118 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-15 15:59:31,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3258120.0, ans=0.125 2024-08-15 15:59:34,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3258120.0, ans=0.0 2024-08-15 15:59:44,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7000, loss[loss=0.1017, beats_loss=0.01112, ecapa_loss=0.0001695, whisper_loss=0.08889, over 15517.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01067, ecapa_loss=0.0001495, whisper_loss=0.09158, over 3879460.18 frames. ], batch size: 65, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 15:59:45,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-15 15:59:53,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-15 15:59:55,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2024-08-15 16:00:04,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3258320.0, ans=0.09899494936611666 2024-08-15 16:00:09,610 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 16:00:34,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-15 16:00:52,607 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-15 16:00:53,701 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7050, loss[loss=0.1023, beats_loss=0.01108, ecapa_loss=0.0001356, whisper_loss=0.08988, over 16696.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01078, ecapa_loss=0.0001504, whisper_loss=0.09066, over 3857375.80 frames. ], batch size: 65, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:01:06,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3258820.0, ans=0.2 2024-08-15 16:01:08,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.307e+01 2.519e+01 2.895e+01 2.053e+02, threshold=5.037e+01, percent-clipped=1.0 2024-08-15 16:01:16,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3258820.0, ans=0.2 2024-08-15 16:01:23,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3258920.0, ans=0.125 2024-08-15 16:01:38,863 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-15 16:02:00,574 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:02:04,210 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7100, loss[loss=0.08463, beats_loss=0.01259, ecapa_loss=0.0001353, whisper_loss=0.07068, over 18540.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001487, whisper_loss=0.09042, over 3837168.60 frames. ], batch size: 78, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:02:22,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3259320.0, ans=0.2 2024-08-15 16:02:33,089 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-15 16:02:57,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3259520.0, ans=0.025 2024-08-15 16:03:02,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3259620.0, ans=0.125 2024-08-15 16:03:04,274 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-15 16:03:06,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3259620.0, ans=0.0 2024-08-15 16:03:15,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7150, loss[loss=0.09835, beats_loss=0.01145, ecapa_loss=0.0001555, whisper_loss=0.08534, over 20291.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01067, ecapa_loss=0.0001484, whisper_loss=0.0909, over 3852861.79 frames. ], batch size: 81, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:03:19,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-08-15 16:03:31,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.305e+01 2.549e+01 2.852e+01 2.933e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-15 16:03:48,266 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 16:03:51,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3259920.0, ans=15.0 2024-08-15 16:04:01,170 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-15 16:04:17,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3260120.0, ans=0.125 2024-08-15 16:04:26,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7200, loss[loss=0.1076, beats_loss=0.009394, ecapa_loss=0.0001559, whisper_loss=0.09669, over 22697.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01072, ecapa_loss=0.0001482, whisper_loss=0.09083, over 3899639.88 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:04:27,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-08-15 16:04:28,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3260220.0, ans=0.125 2024-08-15 16:04:33,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3260220.0, ans=0.125 2024-08-15 16:04:36,755 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-15 16:04:44,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3260320.0, ans=0.125 2024-08-15 16:04:51,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3260320.0, ans=0.125 2024-08-15 16:05:01,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=15.0 2024-08-15 16:05:13,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260520.0, ans=0.1 2024-08-15 16:05:37,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7250, loss[loss=0.1023, beats_loss=0.011, ecapa_loss=0.0001191, whisper_loss=0.09009, over 20603.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.000148, whisper_loss=0.09121, over 3931074.41 frames. ], batch size: 79, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:05:52,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.362e+01 2.587e+01 2.816e+01 1.917e+02, threshold=5.173e+01, percent-clipped=1.0 2024-08-15 16:06:04,900 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 16:06:12,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3260920.0, ans=0.0 2024-08-15 16:06:27,724 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 16:06:29,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3261020.0, ans=0.125 2024-08-15 16:06:29,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3261020.0, ans=0.0 2024-08-15 16:06:46,840 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7300, loss[loss=0.09519, beats_loss=0.01162, ecapa_loss=0.0001178, whisper_loss=0.08239, over 18354.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01071, ecapa_loss=0.0001476, whisper_loss=0.09069, over 3892096.20 frames. ], batch size: 72, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:06:47,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261220.0, ans=0.1 2024-08-15 16:07:02,251 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-15 16:07:13,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3261420.0, ans=0.125 2024-08-15 16:07:17,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3261420.0, ans=0.125 2024-08-15 16:07:22,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3261420.0, ans=0.0 2024-08-15 16:07:33,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-08-15 16:07:48,237 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:07:52,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3261620.0, ans=0.1 2024-08-15 16:07:56,183 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 16:07:56,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3261720.0, ans=0.04949747468305833 2024-08-15 16:07:57,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7350, loss[loss=0.1167, beats_loss=0.01102, ecapa_loss=0.0001356, whisper_loss=0.1043, over 21192.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01072, ecapa_loss=0.0001494, whisper_loss=0.09024, over 3884051.51 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:08:00,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3261720.0, ans=0.1 2024-08-15 16:08:02,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3261720.0, ans=0.5 2024-08-15 16:08:03,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3261720.0, ans=0.125 2024-08-15 16:08:13,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.328e+01 2.533e+01 2.862e+01 3.908e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-15 16:08:27,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3261920.0, ans=0.5 2024-08-15 16:08:34,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3261920.0, ans=0.125 2024-08-15 16:08:48,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3262020.0, ans=0.125 2024-08-15 16:08:55,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3262120.0, ans=0.015 2024-08-15 16:08:58,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3262120.0, ans=0.0 2024-08-15 16:09:08,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7400, loss[loss=0.1035, beats_loss=0.009641, ecapa_loss=0.0001484, whisper_loss=0.09234, over 22368.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01072, ecapa_loss=0.0001491, whisper_loss=0.09037, over 3869760.40 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:09:17,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-15 16:09:20,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-08-15 16:09:22,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3262320.0, ans=0.2 2024-08-15 16:09:29,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3262320.0, ans=0.2 2024-08-15 16:09:52,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3262520.0, ans=0.0 2024-08-15 16:10:03,575 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-15 16:10:15,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3262620.0, ans=0.125 2024-08-15 16:10:17,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7450, loss[loss=0.1082, beats_loss=0.009596, ecapa_loss=0.0001426, whisper_loss=0.09714, over 18706.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001497, whisper_loss=0.0903, over 3863002.87 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:10:22,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3262720.0, ans=0.125 2024-08-15 16:10:31,891 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-15 16:10:32,934 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.336e+01 2.535e+01 2.838e+01 5.757e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-15 16:10:49,617 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 16:10:56,498 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 30 from Vox, 19 fro AS 2024-08-15 16:11:01,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3263020.0, ans=0.125 2024-08-15 16:11:12,491 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-15 16:11:14,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3263120.0, ans=0.125 2024-08-15 16:11:20,232 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-15 16:11:21,580 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-15 16:11:27,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7500, loss[loss=0.09001, beats_loss=0.01312, ecapa_loss=0.0001316, whisper_loss=0.07557, over 13774.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01066, ecapa_loss=0.0001498, whisper_loss=0.09035, over 3865318.63 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:11:29,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3263220.0, ans=0.125 2024-08-15 16:11:30,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3263220.0, ans=0.125 2024-08-15 16:11:38,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3263220.0, ans=0.0 2024-08-15 16:11:47,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3263320.0, ans=0.125 2024-08-15 16:12:01,281 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-15 16:12:05,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3263420.0, ans=0.125 2024-08-15 16:12:15,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3263520.0, ans=0.2 2024-08-15 16:12:15,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-15 16:12:25,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3263620.0, ans=0.0 2024-08-15 16:12:26,274 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 16:12:37,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7550, loss[loss=0.1098, beats_loss=0.007516, ecapa_loss=0.0001615, whisper_loss=0.1007, over 19878.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.00015, whisper_loss=0.08966, over 3843131.09 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:12:51,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-15 16:12:52,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.288e+01 2.542e+01 2.895e+01 9.119e+01, threshold=5.085e+01, percent-clipped=2.0 2024-08-15 16:12:58,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3263820.0, ans=0.125 2024-08-15 16:13:09,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3263920.0, ans=0.0 2024-08-15 16:13:10,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=12.0 2024-08-15 16:13:11,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2024-08-15 16:13:15,514 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-15 16:13:18,289 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 16:13:23,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3264020.0, ans=0.2 2024-08-15 16:13:47,671 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:13:48,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7600, loss[loss=0.08194, beats_loss=0.0124, ecapa_loss=0.0001728, whisper_loss=0.06782, over 21987.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.08953, over 3859190.12 frames. ], batch size: 96, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:13:56,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3264220.0, ans=0.125 2024-08-15 16:14:02,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3264320.0, ans=0.0 2024-08-15 16:14:28,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3264420.0, ans=0.0 2024-08-15 16:14:36,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3264520.0, ans=0.04949747468305833 2024-08-15 16:14:38,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-08-15 16:14:56,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3264620.0, ans=0.1 2024-08-15 16:15:00,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7650, loss[loss=0.1018, beats_loss=0.01153, ecapa_loss=0.0001408, whisper_loss=0.0889, over 20752.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001511, whisper_loss=0.08965, over 3864245.70 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:15:15,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.326e+01 2.582e+01 2.912e+01 5.220e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-15 16:15:15,994 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-15 16:15:32,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3264920.0, ans=0.2 2024-08-15 16:15:37,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3264920.0, ans=0.125 2024-08-15 16:15:44,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-15 16:15:45,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3265020.0, ans=0.125 2024-08-15 16:16:02,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2024-08-15 16:16:09,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3265120.0, ans=0.1 2024-08-15 16:16:10,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-08-15 16:16:11,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7700, loss[loss=0.1144, beats_loss=0.009768, ecapa_loss=0.0001503, whisper_loss=0.1031, over 23937.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.000149, whisper_loss=0.09011, over 3871485.41 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:16:28,889 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 16:16:42,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2024-08-15 16:17:06,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3265520.0, ans=0.5 2024-08-15 16:17:07,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3265520.0, ans=0.1 2024-08-15 16:17:11,877 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 16:17:15,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3265620.0, ans=0.125 2024-08-15 16:17:17,296 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 16:17:19,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3265620.0, ans=0.07 2024-08-15 16:17:24,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7750, loss[loss=0.09326, beats_loss=0.01118, ecapa_loss=0.00015, whisper_loss=0.08058, over 23051.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001476, whisper_loss=0.09081, over 3904622.21 frames. ], batch size: 96, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:17:35,383 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 16:17:46,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.309e+01 2.587e+01 2.792e+01 3.462e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-15 16:17:54,842 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-15 16:18:03,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3265920.0, ans=0.125 2024-08-15 16:18:18,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3266020.0, ans=0.95 2024-08-15 16:18:21,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3266020.0, ans=0.125 2024-08-15 16:18:24,878 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 16:18:50,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7800, loss[loss=0.09638, beats_loss=0.01132, ecapa_loss=0.000119, whisper_loss=0.08388, over 20133.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001465, whisper_loss=0.09083, over 3884975.50 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:18:51,067 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 16:18:53,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3266220.0, ans=0.0 2024-08-15 16:19:16,909 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-15 16:19:27,998 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-15 16:19:30,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3266420.0, ans=0.1 2024-08-15 16:19:42,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3266420.0, ans=0.0 2024-08-15 16:19:55,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3266520.0, ans=0.0 2024-08-15 16:19:57,432 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 16:20:14,947 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 16:20:15,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3266620.0, ans=0.125 2024-08-15 16:20:20,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-15 16:20:32,359 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-15 16:20:33,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7850, loss[loss=0.09703, beats_loss=0.006532, ecapa_loss=0.0001652, whisper_loss=0.08885, over 15784.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01053, ecapa_loss=0.0001477, whisper_loss=0.09176, over 3903670.08 frames. ], batch size: 61, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:20:34,129 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:20:45,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3266720.0, ans=0.125 2024-08-15 16:20:56,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.312e+01 2.657e+01 2.999e+01 5.998e+01, threshold=5.314e+01, percent-clipped=1.0 2024-08-15 16:21:05,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3266820.0, ans=0.125 2024-08-15 16:21:10,242 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 16:21:14,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3266920.0, ans=0.0 2024-08-15 16:21:44,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3267020.0, ans=0.125 2024-08-15 16:21:49,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3267020.0, ans=0.125 2024-08-15 16:22:05,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-15 16:22:09,286 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 16:22:21,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7900, loss[loss=0.09065, beats_loss=0.01178, ecapa_loss=0.0001287, whisper_loss=0.07758, over 16254.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01062, ecapa_loss=0.0001474, whisper_loss=0.09134, over 3898756.49 frames. ], batch size: 64, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:22:39,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-08-15 16:22:42,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3267220.0, ans=0.0 2024-08-15 16:23:13,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3267420.0, ans=0.0 2024-08-15 16:23:17,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3267420.0, ans=0.125 2024-08-15 16:23:21,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3267420.0, ans=22.5 2024-08-15 16:24:04,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3267620.0, ans=0.2 2024-08-15 16:24:17,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-15 16:24:27,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7950, loss[loss=0.08928, beats_loss=0.01106, ecapa_loss=0.0001377, whisper_loss=0.07684, over 15951.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001472, whisper_loss=0.09155, over 3901565.08 frames. ], batch size: 65, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:24:40,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3267720.0, ans=0.125 2024-08-15 16:24:53,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3267820.0, ans=0.2 2024-08-15 16:24:53,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.363e+01 2.541e+01 2.931e+01 3.622e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-15 16:24:56,641 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-15 16:25:05,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3267820.0, ans=0.1 2024-08-15 16:25:10,502 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 16:25:17,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3267920.0, ans=0.1 2024-08-15 16:26:20,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268120.0, ans=0.125 2024-08-15 16:26:33,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8000, loss[loss=0.104, beats_loss=0.01026, ecapa_loss=0.0001798, whisper_loss=0.09191, over 17150.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01057, ecapa_loss=0.000148, whisper_loss=0.09218, over 3907947.08 frames. ], batch size: 72, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:26:37,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.18 vs. limit=22.5 2024-08-15 16:26:40,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3268220.0, ans=0.1 2024-08-15 16:26:46,111 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 16:27:12,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3268320.0, ans=0.025 2024-08-15 16:27:13,093 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 16:27:33,831 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-15 16:27:37,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3268420.0, ans=0.0 2024-08-15 16:27:45,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3268520.0, ans=0.0 2024-08-15 16:27:50,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3268520.0, ans=0.2 2024-08-15 16:28:02,620 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 16:28:11,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-15 16:28:14,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-15 16:28:15,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8050, loss[loss=0.07946, beats_loss=0.01211, ecapa_loss=0.0001478, whisper_loss=0.06588, over 14496.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.09223, over 3921232.57 frames. ], batch size: 57, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:28:24,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3268720.0, ans=0.5 2024-08-15 16:28:32,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.283e+01 2.526e+01 2.890e+01 4.835e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 16:28:54,158 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-15 16:28:56,009 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-15 16:28:57,970 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-15 16:29:02,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3269020.0, ans=0.1 2024-08-15 16:29:02,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3269020.0, ans=0.125 2024-08-15 16:29:15,836 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 16:29:20,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3269120.0, ans=0.0 2024-08-15 16:29:24,694 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-15 16:29:33,861 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 16:29:35,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8100, loss[loss=0.09903, beats_loss=0.01007, ecapa_loss=0.0001519, whisper_loss=0.08745, over 22775.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01052, ecapa_loss=0.0001491, whisper_loss=0.09212, over 3909503.96 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:29:45,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3269220.0, ans=0.125 2024-08-15 16:29:48,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3269220.0, ans=0.125 2024-08-15 16:29:58,709 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-15 16:29:59,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3269320.0, ans=0.125 2024-08-15 16:30:19,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3269420.0, ans=0.2 2024-08-15 16:30:37,357 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.006e-03 2024-08-15 16:30:56,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8150, loss[loss=0.1143, beats_loss=0.009196, ecapa_loss=0.0001508, whisper_loss=0.1036, over 16044.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01046, ecapa_loss=0.0001498, whisper_loss=0.09235, over 3904612.31 frames. ], batch size: 62, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:31:15,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.201e+01 2.455e+01 2.771e+01 3.780e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-15 16:31:30,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3269920.0, ans=0.0 2024-08-15 16:31:49,334 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 16:32:16,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3270220.0, ans=0.0 2024-08-15 16:32:16,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8200, loss[loss=0.1038, beats_loss=0.01171, ecapa_loss=0.0001191, whisper_loss=0.09091, over 17543.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001512, whisper_loss=0.09191, over 3877784.05 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:32:23,082 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-15 16:32:24,787 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-15 16:32:42,873 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 16:32:51,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3270420.0, ans=0.125 2024-08-15 16:33:14,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3270520.0, ans=10.0 2024-08-15 16:33:16,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3270520.0, ans=0.0 2024-08-15 16:33:17,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3270620.0, ans=0.0 2024-08-15 16:33:29,785 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-15 16:33:34,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8250, loss[loss=0.1099, beats_loss=0.011, ecapa_loss=0.0001675, whisper_loss=0.09726, over 18717.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01066, ecapa_loss=0.0001497, whisper_loss=0.09117, over 3881775.27 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:33:50,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.400e+01 2.685e+01 3.048e+01 2.636e+02, threshold=5.369e+01, percent-clipped=3.0 2024-08-15 16:34:00,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2024-08-15 16:34:07,485 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-15 16:34:10,755 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 16:34:35,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3271120.0, ans=0.2 2024-08-15 16:34:40,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3271120.0, ans=0.125 2024-08-15 16:34:43,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3271120.0, ans=0.1 2024-08-15 16:34:48,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8300, loss[loss=0.0966, beats_loss=0.009943, ecapa_loss=0.0001297, whisper_loss=0.08536, over 19637.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001486, whisper_loss=0.0906, over 3886168.62 frames. ], batch size: 77, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:34:52,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3271220.0, ans=0.0 2024-08-15 16:34:57,906 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-15 16:34:58,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3271220.0, ans=0.125 2024-08-15 16:35:14,483 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 16:35:19,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-08-15 16:35:28,419 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 9 from Vox, 43 fro AS 2024-08-15 16:35:33,236 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-15 16:35:42,060 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 16:35:59,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-15 16:36:02,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8350, loss[loss=0.1003, beats_loss=0.01192, ecapa_loss=0.0001188, whisper_loss=0.08716, over 22563.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01077, ecapa_loss=0.0001487, whisper_loss=0.09014, over 3902435.31 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 1.152921504606847e+18 2024-08-15 16:36:18,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.348e+01 2.577e+01 2.897e+01 4.165e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-15 16:36:35,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3271920.0, ans=0.05 2024-08-15 16:36:41,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3271920.0, ans=0.125 2024-08-15 16:36:46,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-15 16:36:48,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3272020.0, ans=0.0 2024-08-15 16:36:50,164 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 16:36:55,258 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.375e+00 2024-08-15 16:36:59,307 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 16:37:07,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3272120.0, ans=0.1 2024-08-15 16:37:17,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8400, loss[loss=0.09744, beats_loss=0.00924, ecapa_loss=0.0001654, whisper_loss=0.08654, over 15009.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01065, ecapa_loss=0.0001493, whisper_loss=0.09078, over 3863728.43 frames. ], batch size: 62, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:37:32,882 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-15 16:37:55,225 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-15 16:38:32,755 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8450, loss[loss=0.09087, beats_loss=0.008926, ecapa_loss=0.0001972, whisper_loss=0.07997, over 22667.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001503, whisper_loss=0.09104, over 3859855.69 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:38:37,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3272720.0, ans=0.2 2024-08-15 16:38:50,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.308e+01 2.508e+01 2.815e+01 5.021e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-15 16:39:02,831 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 16:39:10,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272920.0, ans=0.1 2024-08-15 16:39:13,261 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-15 16:39:13,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3272920.0, ans=0.125 2024-08-15 16:39:32,820 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-15 16:39:47,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8500, loss[loss=0.1016, beats_loss=0.01098, ecapa_loss=0.0001798, whisper_loss=0.08883, over 13215.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.000152, whisper_loss=0.09147, over 3859250.02 frames. ], batch size: 53, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:39:58,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-15 16:40:03,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2024-08-15 16:40:06,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3273320.0, ans=0.125 2024-08-15 16:40:11,391 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 16:40:19,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3273420.0, ans=0.5 2024-08-15 16:40:27,182 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-15 16:40:28,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-08-15 16:40:38,048 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 16:40:48,953 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-15 16:40:52,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3273620.0, ans=0.1 2024-08-15 16:41:05,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8550, loss[loss=0.1296, beats_loss=0.009915, ecapa_loss=0.0001213, whisper_loss=0.1184, over 23559.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001506, whisper_loss=0.0913, over 3838377.37 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:41:21,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3273820.0, ans=0.0 2024-08-15 16:41:23,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.398e+01 2.637e+01 2.998e+01 4.357e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-15 16:41:53,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3274020.0, ans=0.0 2024-08-15 16:42:07,871 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-15 16:42:21,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8600, loss[loss=0.1199, beats_loss=0.009069, ecapa_loss=0.0001268, whisper_loss=0.1095, over 18293.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001508, whisper_loss=0.09108, over 3823901.73 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:42:24,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=22.5 2024-08-15 16:42:37,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3274320.0, ans=0.125 2024-08-15 16:42:51,839 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 40 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-15 16:42:55,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2024-08-15 16:43:24,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3274620.0, ans=0.0 2024-08-15 16:43:34,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3274620.0, ans=0.125 2024-08-15 16:43:37,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8650, loss[loss=0.1478, beats_loss=0.007208, ecapa_loss=0.0001492, whisper_loss=0.1391, over 22578.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001492, whisper_loss=0.09056, over 3814044.96 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:43:51,703 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-15 16:43:53,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3274820.0, ans=0.0 2024-08-15 16:43:55,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.305e+01 2.531e+01 2.832e+01 4.112e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:44:15,346 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 16:44:18,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3274920.0, ans=0.0 2024-08-15 16:44:21,509 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 16:44:28,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-15 16:44:49,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=12.0 2024-08-15 16:44:53,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8700, loss[loss=0.09303, beats_loss=0.01039, ecapa_loss=0.000174, whisper_loss=0.0809, over 22821.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.09024, over 3818068.09 frames. ], batch size: 95, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:45:09,611 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-15 16:45:12,670 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 16:45:25,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3275420.0, ans=0.2 2024-08-15 16:45:28,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2024-08-15 16:45:28,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-08-15 16:45:31,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275420.0, ans=0.1 2024-08-15 16:45:33,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3275420.0, ans=0.125 2024-08-15 16:45:55,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-15 16:46:01,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3275620.0, ans=0.5 2024-08-15 16:46:11,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8750, loss[loss=0.09934, beats_loss=0.008297, ecapa_loss=0.000117, whisper_loss=0.08987, over 15351.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.09061, over 3808148.12 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:46:12,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275720.0, ans=0.1 2024-08-15 16:46:13,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3275720.0, ans=0.125 2024-08-15 16:46:22,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275720.0, ans=0.1 2024-08-15 16:46:23,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3275720.0, ans=0.125 2024-08-15 16:46:29,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.332e+01 2.573e+01 2.934e+01 5.671e+01, threshold=5.146e+01, percent-clipped=2.0 2024-08-15 16:46:47,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-15 16:47:01,722 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 16:47:10,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-15 16:47:27,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3276120.0, ans=0.125 2024-08-15 16:47:39,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8800, loss[loss=0.0923, beats_loss=0.01026, ecapa_loss=0.000149, whisper_loss=0.08056, over 18321.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.000149, whisper_loss=0.09047, over 3845513.60 frames. ], batch size: 69, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:47:42,505 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 15 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-15 16:48:08,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3276320.0, ans=0.0 2024-08-15 16:49:15,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8850, loss[loss=0.1031, beats_loss=0.01281, ecapa_loss=0.0001116, whisper_loss=0.08914, over 22313.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01077, ecapa_loss=0.0001475, whisper_loss=0.08968, over 3818917.23 frames. ], batch size: 86, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:49:16,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2024-08-15 16:49:23,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3276720.0, ans=0.0 2024-08-15 16:49:25,474 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-15 16:49:31,067 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-15 16:49:35,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3276820.0, ans=0.2 2024-08-15 16:49:36,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.293e+01 2.672e+01 3.024e+01 1.700e+02, threshold=5.345e+01, percent-clipped=3.0 2024-08-15 16:49:51,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3276920.0, ans=0.04949747468305833 2024-08-15 16:50:20,129 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 16:50:25,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-15 16:50:35,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8900, loss[loss=0.1095, beats_loss=0.009605, ecapa_loss=0.0001292, whisper_loss=0.09859, over 15448.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001482, whisper_loss=0.09015, over 3783073.59 frames. ], batch size: 55, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:50:38,708 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 16:50:46,397 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-15 16:50:56,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3277320.0, ans=0.125 2024-08-15 16:51:03,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3277320.0, ans=0.95 2024-08-15 16:51:03,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3277320.0, ans=0.125 2024-08-15 16:51:08,429 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-15 16:51:09,885 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 16:51:11,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3277420.0, ans=0.0 2024-08-15 16:51:12,924 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 16:51:33,350 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-15 16:51:42,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-08-15 16:51:46,538 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 16:51:49,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8950, loss[loss=0.107, beats_loss=0.009913, ecapa_loss=0.0001534, whisper_loss=0.09552, over 16450.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001482, whisper_loss=0.09062, over 3826228.42 frames. ], batch size: 66, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:51:53,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-08-15 16:51:54,222 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-15 16:52:06,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.302e+01 2.531e+01 2.768e+01 4.662e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-15 16:52:25,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277920.0, ans=0.1 2024-08-15 16:52:36,581 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 16:52:55,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=8.0 2024-08-15 16:52:57,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3278120.0, ans=0.0 2024-08-15 16:53:01,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-15 16:53:01,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9000, loss[loss=0.1165, beats_loss=0.00881, ecapa_loss=0.0001521, whisper_loss=0.1061, over 21411.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001483, whisper_loss=0.09018, over 3858302.55 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:53:01,957 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 16:53:36,327 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2547, 2.9426, 2.4699, 1.6110], device='cuda:3') 2024-08-15 16:53:39,110 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2514, beats_loss=0, ecapa_loss=0.0005338, whisper_loss=0.2461, over 922467.00 frames. 2024-08-15 16:53:57,657 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004212, beats_loss=0, ecapa_loss=0.0004212, whisper_loss=0, over 939242.00 frames. 2024-08-15 16:55:05,105 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7057, 2.2772, 2.0120, 2.0129], device='cuda:3') 2024-08-15 16:55:49,299 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02337, beats_loss=0.02337, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 16:55:49,303 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 16:55:51,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-15 16:55:53,778 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 16:55:59,518 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-15 16:56:14,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3278320.0, ans=0.125 2024-08-15 16:56:15,784 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-15 16:56:19,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2024-08-15 16:56:23,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3278420.0, ans=0.07 2024-08-15 16:56:35,324 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 16:56:54,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2024-08-15 16:56:59,374 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 16:57:03,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9050, loss[loss=0.1152, beats_loss=0.008905, ecapa_loss=0.0001412, whisper_loss=0.1048, over 21247.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001487, whisper_loss=0.08979, over 3834056.75 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:57:08,466 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 16:57:21,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.395e+01 2.682e+01 2.921e+01 1.898e+02, threshold=5.364e+01, percent-clipped=1.0 2024-08-15 16:57:26,469 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 16:57:32,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-15 16:57:40,995 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-15 16:57:44,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3278920.0, ans=0.125 2024-08-15 16:57:54,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3279020.0, ans=0.0 2024-08-15 16:58:09,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3279120.0, ans=0.035 2024-08-15 16:58:09,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3279120.0, ans=10.0 2024-08-15 16:58:10,540 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-15 16:58:17,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9100, loss[loss=0.1153, beats_loss=0.009695, ecapa_loss=0.0001386, whisper_loss=0.1042, over 18992.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001499, whisper_loss=0.08972, over 3830530.86 frames. ], batch size: 68, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:58:22,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3279220.0, ans=0.125 2024-08-15 16:58:34,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3279320.0, ans=0.2 2024-08-15 16:58:52,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.01 vs. limit=22.5 2024-08-15 16:58:54,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279420.0, ans=0.1 2024-08-15 16:59:11,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-08-15 16:59:30,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9150, loss[loss=0.1022, beats_loss=0.009614, ecapa_loss=0.0001438, whisper_loss=0.09117, over 23142.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.00015, whisper_loss=0.09041, over 3875477.82 frames. ], batch size: 93, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 16:59:33,252 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-15 16:59:39,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3279720.0, ans=0.0 2024-08-15 16:59:42,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3279720.0, ans=0.0 2024-08-15 16:59:45,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3279820.0, ans=0.1 2024-08-15 16:59:47,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.265e+01 2.493e+01 2.729e+01 3.385e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 16:59:49,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3279820.0, ans=0.07 2024-08-15 17:00:41,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3280120.0, ans=0.125 2024-08-15 17:00:46,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9200, loss[loss=0.09367, beats_loss=0.0112, ecapa_loss=0.000136, whisper_loss=0.08111, over 19217.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001494, whisper_loss=0.09049, over 3844750.43 frames. ], batch size: 75, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:00:51,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3280220.0, ans=15.0 2024-08-15 17:01:01,682 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 17:01:08,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3280320.0, ans=0.125 2024-08-15 17:01:09,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3280320.0, ans=0.125 2024-08-15 17:01:21,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3280420.0, ans=0.0 2024-08-15 17:01:40,207 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-15 17:01:41,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3280520.0, ans=0.0 2024-08-15 17:01:51,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3280620.0, ans=0.02 2024-08-15 17:01:53,819 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-15 17:01:54,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3280620.0, ans=15.0 2024-08-15 17:02:00,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9250, loss[loss=0.1201, beats_loss=0.009252, ecapa_loss=0.0001434, whisper_loss=0.1094, over 16258.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01058, ecapa_loss=0.0001498, whisper_loss=0.09063, over 3854759.78 frames. ], batch size: 62, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:02:17,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.360e+01 2.606e+01 2.888e+01 4.280e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-15 17:02:21,794 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-15 17:02:38,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=22.5 2024-08-15 17:02:39,426 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-15 17:02:48,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3281020.0, ans=0.0 2024-08-15 17:03:06,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281120.0, ans=0.1 2024-08-15 17:03:08,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3281120.0, ans=0.0 2024-08-15 17:03:14,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9300, loss[loss=0.08706, beats_loss=0.0129, ecapa_loss=0.0001461, whisper_loss=0.0727, over 14605.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09069, over 3863427.68 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:03:18,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2024-08-15 17:03:26,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-08-15 17:03:39,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3281320.0, ans=0.035 2024-08-15 17:03:40,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3281320.0, ans=0.125 2024-08-15 17:03:44,170 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-15 17:03:51,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3281420.0, ans=0.05 2024-08-15 17:03:55,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3281420.0, ans=0.125 2024-08-15 17:04:00,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3281520.0, ans=0.125 2024-08-15 17:04:13,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3281520.0, ans=15.0 2024-08-15 17:04:32,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9350, loss[loss=0.09829, beats_loss=0.008556, ecapa_loss=0.0001701, whisper_loss=0.08803, over 20922.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001498, whisper_loss=0.09018, over 3874364.64 frames. ], batch size: 84, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:04:51,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.526e+01 2.881e+01 4.072e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-15 17:05:04,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3281920.0, ans=0.125 2024-08-15 17:05:19,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3282020.0, ans=0.125 2024-08-15 17:05:20,096 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-15 17:05:23,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3282020.0, ans=0.0 2024-08-15 17:05:49,185 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9400, loss[loss=0.0851, beats_loss=0.00887, ecapa_loss=0.0001869, whisper_loss=0.07437, over 13102.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01069, ecapa_loss=0.00015, whisper_loss=0.08974, over 3848023.70 frames. ], batch size: 54, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:06:15,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3282320.0, ans=0.125 2024-08-15 17:06:28,513 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-15 17:06:42,661 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-15 17:06:54,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3282620.0, ans=0.125 2024-08-15 17:06:56,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3282620.0, ans=0.125 2024-08-15 17:07:02,350 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-15 17:07:08,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9450, loss[loss=0.133, beats_loss=0.006839, ecapa_loss=0.0001529, whisper_loss=0.1246, over 23335.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001497, whisper_loss=0.09001, over 3857008.98 frames. ], batch size: 86, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:07:27,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.277e+01 2.590e+01 2.788e+01 7.153e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-15 17:07:48,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3282920.0, ans=0.125 2024-08-15 17:08:09,632 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 17:08:21,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-15 17:08:26,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9500, loss[loss=0.07805, beats_loss=0.01146, ecapa_loss=0.0001656, whisper_loss=0.06494, over 13854.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001498, whisper_loss=0.08929, over 3846839.45 frames. ], batch size: 59, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:08:33,843 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-15 17:09:01,483 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 17:09:16,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3283520.0, ans=0.0 2024-08-15 17:09:20,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3283520.0, ans=0.0 2024-08-15 17:09:25,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3283620.0, ans=0.125 2024-08-15 17:09:35,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283620.0, ans=0.1 2024-08-15 17:09:40,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9550, loss[loss=0.1074, beats_loss=0.01008, ecapa_loss=0.000165, whisper_loss=0.09568, over 22416.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001505, whisper_loss=0.08919, over 3874761.16 frames. ], batch size: 90, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:09:48,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3283720.0, ans=0.125 2024-08-15 17:09:57,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.381e+01 2.631e+01 2.929e+01 4.005e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-15 17:10:00,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-15 17:10:03,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3283820.0, ans=0.0 2024-08-15 17:10:39,981 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:10:41,263 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-15 17:10:54,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9600, loss[loss=0.106, beats_loss=0.009287, ecapa_loss=0.0001977, whisper_loss=0.0947, over 20646.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01063, ecapa_loss=0.0001512, whisper_loss=0.08913, over 3852913.94 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:11:02,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3284220.0, ans=0.2 2024-08-15 17:11:06,268 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-15 17:11:30,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3284420.0, ans=0.125 2024-08-15 17:11:40,941 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.853e+01 2024-08-15 17:11:43,648 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-15 17:11:54,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=12.0 2024-08-15 17:12:08,224 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9650, loss[loss=0.1041, beats_loss=0.009156, ecapa_loss=0.0001544, whisper_loss=0.09337, over 15891.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001514, whisper_loss=0.08951, over 3839175.43 frames. ], batch size: 62, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:12:11,273 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 17:12:21,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3284820.0, ans=0.2 2024-08-15 17:12:25,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.315e+01 2.525e+01 2.831e+01 4.515e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-15 17:12:32,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3284820.0, ans=0.0 2024-08-15 17:12:45,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=12.0 2024-08-15 17:12:48,047 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 17:12:49,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3284920.0, ans=0.125 2024-08-15 17:13:01,764 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-15 17:13:02,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-15 17:13:05,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3285020.0, ans=0.0 2024-08-15 17:13:21,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9700, loss[loss=0.08209, beats_loss=0.01201, ecapa_loss=0.0001403, whisper_loss=0.06868, over 16442.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001528, whisper_loss=0.08925, over 3840728.16 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:13:26,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-15 17:14:02,477 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 17:14:17,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3285320.0, ans=0.0 2024-08-15 17:15:03,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3285520.0, ans=0.125 2024-08-15 17:15:05,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3285620.0, ans=0.2 2024-08-15 17:15:12,604 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-15 17:15:15,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3285620.0, ans=0.125 2024-08-15 17:15:19,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9750, loss[loss=0.08324, beats_loss=0.01125, ecapa_loss=0.0001535, whisper_loss=0.07045, over 21290.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001529, whisper_loss=0.08965, over 3841600.35 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:15:39,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3285820.0, ans=0.125 2024-08-15 17:15:40,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.300e+01 2.530e+01 2.812e+01 4.314e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-15 17:15:48,290 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 17:15:59,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3285920.0, ans=0.1 2024-08-15 17:16:40,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3286120.0, ans=0.125 2024-08-15 17:16:48,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3286120.0, ans=0.125 2024-08-15 17:16:49,816 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 17:17:02,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9800, loss[loss=0.1089, beats_loss=0.01053, ecapa_loss=0.0001365, whisper_loss=0.09699, over 22700.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001513, whisper_loss=0.08887, over 3842867.49 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:17:05,218 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 17:17:16,880 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-15 17:17:36,435 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-15 17:17:46,982 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 17:18:04,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-08-15 17:18:09,111 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 17:18:15,815 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-15 17:18:18,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3286520.0, ans=0.125 2024-08-15 17:18:32,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3286620.0, ans=0.1 2024-08-15 17:18:52,944 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.882e-01 2024-08-15 17:18:53,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3286720.0, ans=0.125 2024-08-15 17:18:53,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9850, loss[loss=0.1148, beats_loss=0.01054, ecapa_loss=0.0001499, whisper_loss=0.1028, over 23048.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001507, whisper_loss=0.08939, over 3852264.50 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:18:58,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3286720.0, ans=0.05 2024-08-15 17:19:01,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3286720.0, ans=0.125 2024-08-15 17:19:11,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3286720.0, ans=0.125 2024-08-15 17:19:24,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.372e+01 2.632e+01 2.935e+01 4.456e+01, threshold=5.264e+01, percent-clipped=0.0 2024-08-15 17:19:38,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3286820.0, ans=0.125 2024-08-15 17:20:17,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3287020.0, ans=0.125 2024-08-15 17:20:57,571 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-15 17:20:57,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3287220.0, ans=0.125 2024-08-15 17:20:58,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9900, loss[loss=0.1051, beats_loss=0.01062, ecapa_loss=0.0001452, whisper_loss=0.09307, over 21540.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001507, whisper_loss=0.08954, over 3868713.88 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:21:18,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3287220.0, ans=0.0 2024-08-15 17:21:30,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2024-08-15 17:22:20,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3287520.0, ans=0.1 2024-08-15 17:22:36,180 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 17:23:01,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9950, loss[loss=0.1029, beats_loss=0.007682, ecapa_loss=0.0001912, whisper_loss=0.0933, over 21164.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01068, ecapa_loss=0.0001507, whisper_loss=0.08906, over 3864183.55 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:23:31,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.466e+01 2.723e+01 3.016e+01 5.091e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-15 17:24:03,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3287920.0, ans=0.035 2024-08-15 17:24:21,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3288020.0, ans=0.0 2024-08-15 17:24:30,790 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-15 17:24:31,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2024-08-15 17:24:37,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3288120.0, ans=0.0 2024-08-15 17:24:50,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10000, loss[loss=0.07363, beats_loss=0.01231, ecapa_loss=0.0001876, whisper_loss=0.05945, over 20808.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001513, whisper_loss=0.08937, over 3885502.52 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:24:56,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3288220.0, ans=0.125 2024-08-15 17:25:01,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3288220.0, ans=0.0 2024-08-15 17:25:03,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3288220.0, ans=0.125 2024-08-15 17:25:47,412 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-15 17:25:49,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-15 17:26:18,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10050, loss[loss=0.09595, beats_loss=0.01042, ecapa_loss=0.0001541, whisper_loss=0.08399, over 19798.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01064, ecapa_loss=0.0001502, whisper_loss=0.08913, over 3862180.89 frames. ], batch size: 84, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:26:29,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-08-15 17:26:40,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.323e+01 2.519e+01 2.738e+01 4.374e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-15 17:27:22,356 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-15 17:27:27,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3289020.0, ans=0.125 2024-08-15 17:27:45,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289120.0, ans=0.125 2024-08-15 17:27:47,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10100, loss[loss=0.09974, beats_loss=0.01183, ecapa_loss=0.0001548, whisper_loss=0.08636, over 21503.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01069, ecapa_loss=0.0001493, whisper_loss=0.0893, over 3900420.00 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:28:09,521 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 17:28:15,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3289320.0, ans=10.0 2024-08-15 17:28:29,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3289420.0, ans=0.2 2024-08-15 17:28:39,674 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-15 17:28:40,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-15 17:28:58,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3289520.0, ans=0.125 2024-08-15 17:29:18,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10150, loss[loss=0.1087, beats_loss=0.009642, ecapa_loss=0.0001522, whisper_loss=0.09755, over 22575.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01068, ecapa_loss=0.0001503, whisper_loss=0.08988, over 3920312.60 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:29:26,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3289720.0, ans=0.0 2024-08-15 17:29:37,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289820.0, ans=0.1 2024-08-15 17:29:39,396 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.314e+01 2.537e+01 2.890e+01 1.648e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-15 17:29:40,068 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 17:29:50,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3289820.0, ans=0.0 2024-08-15 17:29:57,726 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-15 17:29:59,580 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-15 17:30:02,665 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-15 17:30:19,026 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-15 17:30:41,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10200, loss[loss=0.08211, beats_loss=0.009185, ecapa_loss=0.0001515, whisper_loss=0.07141, over 15581.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001509, whisper_loss=0.09032, over 3897029.90 frames. ], batch size: 59, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:30:44,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3290220.0, ans=0.1 2024-08-15 17:31:09,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3290320.0, ans=0.0 2024-08-15 17:31:09,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-08-15 17:31:18,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-08-15 17:31:28,622 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-15 17:31:47,495 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 17:32:05,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10250, loss[loss=0.09234, beats_loss=0.0123, ecapa_loss=0.0001154, whisper_loss=0.07889, over 15334.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001519, whisper_loss=0.09067, over 3910452.61 frames. ], batch size: 60, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:32:25,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.349e+01 2.551e+01 2.894e+01 3.006e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-15 17:32:26,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3290820.0, ans=0.125 2024-08-15 17:32:32,312 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 17:32:39,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3290920.0, ans=0.0 2024-08-15 17:32:59,880 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-15 17:33:23,467 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-15 17:33:24,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3291120.0, ans=0.1 2024-08-15 17:33:26,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10300, loss[loss=0.0802, beats_loss=0.01139, ecapa_loss=0.0001776, whisper_loss=0.06704, over 18559.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001517, whisper_loss=0.09079, over 3911836.16 frames. ], batch size: 79, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:33:30,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3291220.0, ans=0.0 2024-08-15 17:33:34,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3291220.0, ans=0.125 2024-08-15 17:33:37,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3291220.0, ans=0.125 2024-08-15 17:33:55,518 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.265e-01 2024-08-15 17:33:58,055 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 26 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-15 17:34:09,589 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-15 17:34:12,531 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-15 17:34:29,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291620.0, ans=0.1 2024-08-15 17:34:43,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10350, loss[loss=0.1001, beats_loss=0.0125, ecapa_loss=0.0001516, whisper_loss=0.08604, over 21580.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01057, ecapa_loss=0.0001508, whisper_loss=0.09156, over 3918564.32 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:34:48,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3291720.0, ans=0.0 2024-08-15 17:35:02,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.326e+01 2.650e+01 2.958e+01 2.904e+02, threshold=5.299e+01, percent-clipped=2.0 2024-08-15 17:35:10,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3291820.0, ans=0.0 2024-08-15 17:35:14,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3291920.0, ans=0.125 2024-08-15 17:35:15,515 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-15 17:35:20,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3291920.0, ans=0.0 2024-08-15 17:35:20,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3291920.0, ans=0.125 2024-08-15 17:35:27,586 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 17:35:35,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3292020.0, ans=0.125 2024-08-15 17:36:00,059 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-15 17:36:00,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10400, loss[loss=0.1156, beats_loss=0.006589, ecapa_loss=0.0001616, whisper_loss=0.1074, over 17400.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001503, whisper_loss=0.09148, over 3888105.49 frames. ], batch size: 68, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:36:07,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3292220.0, ans=0.125 2024-08-15 17:36:10,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3292220.0, ans=0.0 2024-08-15 17:36:27,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2024-08-15 17:36:29,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3292420.0, ans=0.2 2024-08-15 17:36:32,067 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-15 17:36:34,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2024-08-15 17:36:47,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.13 vs. limit=22.5 2024-08-15 17:37:13,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=12.0 2024-08-15 17:37:14,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10450, loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001525, whisper_loss=0.09047, over 15059.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.0001492, whisper_loss=0.0913, over 3844882.63 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:37:27,268 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-15 17:37:31,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.170e+01 2.524e+01 2.860e+01 1.816e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-15 17:37:36,679 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-15 17:37:42,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3292920.0, ans=0.125 2024-08-15 17:37:46,656 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 17:37:53,130 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 17:37:56,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292920.0, ans=0.1 2024-08-15 17:38:10,435 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-15 17:38:24,016 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-15 17:38:24,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3293120.0, ans=0.0 2024-08-15 17:38:25,485 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-15 17:38:28,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10500, loss[loss=0.07191, beats_loss=0.01006, ecapa_loss=0.0002143, whisper_loss=0.05971, over 15803.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001501, whisper_loss=0.09163, over 3870541.40 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 1.152921504606847e+18 2024-08-15 17:38:30,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3293220.0, ans=0.0 2024-08-15 17:39:07,881 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 17:39:15,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3293520.0, ans=0.0 2024-08-15 17:39:22,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3293520.0, ans=0.0 2024-08-15 17:39:29,769 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-15 17:39:42,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10550, loss[loss=0.1111, beats_loss=0.009688, ecapa_loss=0.0001417, whisper_loss=0.09997, over 21947.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001498, whisper_loss=0.09092, over 3905461.19 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:39:50,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3293720.0, ans=0.125 2024-08-15 17:40:01,129 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.311e+01 2.619e+01 2.877e+01 4.261e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-15 17:40:02,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-15 17:40:11,617 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-15 17:40:23,302 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-15 17:40:26,125 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-15 17:40:41,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3294120.0, ans=0.0 2024-08-15 17:40:42,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3294120.0, ans=0.125 2024-08-15 17:40:54,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10600, loss[loss=0.09742, beats_loss=0.01054, ecapa_loss=0.0001432, whisper_loss=0.08545, over 19228.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001504, whisper_loss=0.09084, over 3895712.24 frames. ], batch size: 75, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:40:57,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3294220.0, ans=0.125 2024-08-15 17:41:02,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3294220.0, ans=0.1 2024-08-15 17:41:07,608 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-15 17:41:43,335 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-15 17:41:52,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3294620.0, ans=0.125 2024-08-15 17:42:06,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10650, loss[loss=0.1063, beats_loss=0.008359, ecapa_loss=0.0001292, whisper_loss=0.0966, over 15481.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001502, whisper_loss=0.0908, over 3885015.11 frames. ], batch size: 57, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:42:15,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3294720.0, ans=0.0 2024-08-15 17:42:16,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3294720.0, ans=0.0 2024-08-15 17:42:22,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3294820.0, ans=0.1 2024-08-15 17:42:24,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.386e+01 2.620e+01 3.005e+01 5.015e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 17:42:27,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3294820.0, ans=0.0 2024-08-15 17:42:32,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3294820.0, ans=0.0 2024-08-15 17:42:34,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3294920.0, ans=10.0 2024-08-15 17:42:43,800 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-15 17:42:47,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3294920.0, ans=0.125 2024-08-15 17:42:54,095 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-15 17:43:19,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10700, loss[loss=0.1155, beats_loss=0.009936, ecapa_loss=0.0001357, whisper_loss=0.1042, over 19178.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001501, whisper_loss=0.09108, over 3917520.04 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:43:27,695 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 17:43:59,888 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-15 17:44:11,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3295520.0, ans=0.125 2024-08-15 17:44:13,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3295520.0, ans=0.125 2024-08-15 17:44:14,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3295520.0, ans=0.125 2024-08-15 17:44:17,061 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 17:44:21,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3295620.0, ans=0.0 2024-08-15 17:44:22,577 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 17:44:27,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3295620.0, ans=0.0 2024-08-15 17:44:32,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10750, loss[loss=0.1011, beats_loss=0.009742, ecapa_loss=0.0001491, whisper_loss=0.08984, over 17778.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001502, whisper_loss=0.0909, over 3885844.43 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:44:42,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3295720.0, ans=0.125 2024-08-15 17:44:47,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3295820.0, ans=0.125 2024-08-15 17:44:50,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.313e+01 2.649e+01 2.924e+01 4.383e+01, threshold=5.299e+01, percent-clipped=0.0 2024-08-15 17:45:06,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3295920.0, ans=0.125 2024-08-15 17:45:06,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3295920.0, ans=0.125 2024-08-15 17:45:24,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3296020.0, ans=0.125 2024-08-15 17:45:29,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3296120.0, ans=0.125 2024-08-15 17:45:44,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10800, loss[loss=0.1101, beats_loss=0.01066, ecapa_loss=0.0001283, whisper_loss=0.09821, over 23853.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001504, whisper_loss=0.09102, over 3868349.91 frames. ], batch size: 93, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:45:46,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-15 17:45:50,228 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-15 17:46:25,962 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 17:46:30,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3296520.0, ans=0.2 2024-08-15 17:46:34,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3296520.0, ans=0.125 2024-08-15 17:46:39,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296520.0, ans=0.1 2024-08-15 17:46:54,316 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04891674965620041, model_norm_threshold=52.98820877075195 2024-08-15 17:46:54,489 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.34, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.961e+05, grad_sumsq=3.961e+05, orig_rms_sq=1.000e+00 2024-08-15 17:46:55,802 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10850, loss[loss=0.1214, beats_loss=0.005884, ecapa_loss=0.0001796, whisper_loss=0.1138, over 19051.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01051, ecapa_loss=0.000152, whisper_loss=0.09149, over 3888914.31 frames. ], batch size: 75, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:46:57,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296720.0, ans=0.1 2024-08-15 17:47:09,856 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 17:47:13,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.310e+01 2.561e+01 2.919e+01 1.083e+03, threshold=5.121e+01, percent-clipped=1.0 2024-08-15 17:47:26,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-08-15 17:47:44,876 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-15 17:47:50,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2024-08-15 17:48:01,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297120.0, ans=0.1 2024-08-15 17:48:08,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10900, loss[loss=0.08805, beats_loss=0.01265, ecapa_loss=0.00014, whisper_loss=0.074, over 21915.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.0001514, whisper_loss=0.09151, over 3923814.47 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 5.764607523034235e+17 2024-08-15 17:48:18,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3297220.0, ans=0.125 2024-08-15 17:48:32,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3297320.0, ans=0.125 2024-08-15 17:48:32,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3297320.0, ans=0.0 2024-08-15 17:48:36,276 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 17:48:39,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3297420.0, ans=0.125 2024-08-15 17:48:50,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3297520.0, ans=0.125 2024-08-15 17:48:54,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2024-08-15 17:48:59,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2024-08-15 17:49:09,564 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-15 17:49:09,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3297620.0, ans=0.0 2024-08-15 17:49:19,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3297720.0, ans=0.0 2024-08-15 17:49:20,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10950, loss[loss=0.1165, beats_loss=0.00804, ecapa_loss=0.0001708, whisper_loss=0.1068, over 15465.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01047, ecapa_loss=0.0001513, whisper_loss=0.09204, over 3929187.18 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:49:22,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3297720.0, ans=0.05 2024-08-15 17:49:40,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.372e+01 2.632e+01 3.011e+01 4.357e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-15 17:50:00,516 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-15 17:50:06,337 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 17:50:07,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3298020.0, ans=0.125 2024-08-15 17:50:17,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298120.0, ans=0.125 2024-08-15 17:50:24,689 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-15 17:50:27,329 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-15 17:50:27,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3298120.0, ans=0.1 2024-08-15 17:50:29,963 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-15 17:50:31,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11000, loss[loss=0.09589, beats_loss=0.009742, ecapa_loss=0.0001792, whisper_loss=0.08435, over 15247.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001528, whisper_loss=0.09093, over 3897169.42 frames. ], batch size: 63, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:50:32,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298220.0, ans=0.1 2024-08-15 17:50:32,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3298220.0, ans=0.1 2024-08-15 17:50:41,039 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 31 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-15 17:51:16,869 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 17:51:29,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3298620.0, ans=0.125 2024-08-15 17:51:33,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3298620.0, ans=0.0 2024-08-15 17:51:41,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11050, loss[loss=0.1052, beats_loss=0.01166, ecapa_loss=0.0001156, whisper_loss=0.09234, over 22199.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001519, whisper_loss=0.09045, over 3905539.04 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:51:47,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3298720.0, ans=0.0 2024-08-15 17:51:58,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3298820.0, ans=0.0 2024-08-15 17:52:03,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.406e+01 2.620e+01 2.867e+01 4.013e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-15 17:52:16,297 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 17:52:17,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3298920.0, ans=0.0 2024-08-15 17:52:22,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3298920.0, ans=0.125 2024-08-15 17:52:30,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3299020.0, ans=0.0 2024-08-15 17:52:36,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3299020.0, ans=0.1 2024-08-15 17:52:43,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3299120.0, ans=0.0 2024-08-15 17:52:54,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11100, loss[loss=0.1197, beats_loss=0.009376, ecapa_loss=0.0001543, whisper_loss=0.1087, over 22271.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001515, whisper_loss=0.09026, over 3924599.21 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:53:15,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-15 17:53:19,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-15 17:53:30,426 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-15 17:54:08,309 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11150, loss[loss=0.09844, beats_loss=0.01033, ecapa_loss=0.0001607, whisper_loss=0.08651, over 17851.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.000152, whisper_loss=0.08979, over 3891787.25 frames. ], batch size: 75, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:54:23,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3299820.0, ans=0.95 2024-08-15 17:54:28,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.377e+01 2.635e+01 3.031e+01 4.135e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-15 17:54:31,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3299820.0, ans=0.125 2024-08-15 17:54:44,446 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 11 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-15 17:54:44,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3299920.0, ans=0.0 2024-08-15 17:54:47,576 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-15 17:55:02,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3300020.0, ans=0.125 2024-08-15 17:55:03,111 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-15 17:55:05,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-15 17:55:08,713 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-15 17:55:16,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3300120.0, ans=0.125 2024-08-15 17:55:17,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3300120.0, ans=0.0 2024-08-15 17:55:19,093 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-15 17:55:20,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11200, loss[loss=0.1056, beats_loss=0.009485, ecapa_loss=0.0001563, whisper_loss=0.09457, over 22567.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001527, whisper_loss=0.09, over 3877997.51 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:55:36,065 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-15 17:55:36,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3300320.0, ans=0.0 2024-08-15 17:55:47,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3300320.0, ans=0.04949747468305833 2024-08-15 17:55:54,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3300420.0, ans=0.0 2024-08-15 17:56:00,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3300420.0, ans=0.125 2024-08-15 17:56:06,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3300520.0, ans=0.125 2024-08-15 17:56:10,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3300520.0, ans=0.0 2024-08-15 17:56:11,697 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 16 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-15 17:56:13,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-15 17:56:25,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 17:56:27,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2024-08-15 17:56:33,527 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11250, loss[loss=0.106, beats_loss=0.01143, ecapa_loss=0.000137, whisper_loss=0.09318, over 14058.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001529, whisper_loss=0.09071, over 3902388.18 frames. ], batch size: 55, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:56:41,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-08-15 17:56:43,457 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 11 from Vox, 50 fro AS 2024-08-15 17:56:49,067 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-15 17:56:53,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.493e+01 2.758e+01 4.504e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-15 17:56:59,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3300820.0, ans=0.0 2024-08-15 17:57:03,622 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-15 17:57:24,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3301020.0, ans=0.1 2024-08-15 17:57:24,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.44 vs. limit=6.0 2024-08-15 17:57:44,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11300, loss[loss=0.1055, beats_loss=0.01229, ecapa_loss=0.0001229, whisper_loss=0.09202, over 23276.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001514, whisper_loss=0.09113, over 3906179.96 frames. ], batch size: 91, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:57:48,086 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-15 17:57:58,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3301320.0, ans=0.5 2024-08-15 17:58:00,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3301320.0, ans=0.0 2024-08-15 17:58:10,951 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 17:58:24,237 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 17:58:31,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2024-08-15 17:58:37,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3301520.0, ans=0.04949747468305833 2024-08-15 17:58:43,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3301620.0, ans=10.0 2024-08-15 17:58:43,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-15 17:58:48,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3301620.0, ans=0.0 2024-08-15 17:58:53,103 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 17:58:57,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11350, loss[loss=0.08171, beats_loss=0.01252, ecapa_loss=0.0001504, whisper_loss=0.06769, over 21006.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001508, whisper_loss=0.09119, over 3914859.26 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 17:59:08,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301720.0, ans=0.1 2024-08-15 17:59:12,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3301820.0, ans=0.0 2024-08-15 17:59:17,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.304e+01 2.607e+01 2.921e+01 2.640e+02, threshold=5.213e+01, percent-clipped=2.0 2024-08-15 17:59:23,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2024-08-15 17:59:33,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3301920.0, ans=0.125 2024-08-15 18:00:05,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3302120.0, ans=0.0 2024-08-15 18:00:07,198 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-15 18:00:11,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11400, loss[loss=0.1236, beats_loss=0.009112, ecapa_loss=0.0001554, whisper_loss=0.1129, over 23329.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01035, ecapa_loss=0.0001507, whisper_loss=0.09269, over 3937124.70 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:00:17,603 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-15 18:00:19,410 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-15 18:00:19,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3302220.0, ans=0.125 2024-08-15 18:00:32,725 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-15 18:00:54,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=12.0 2024-08-15 18:00:56,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3302520.0, ans=0.0 2024-08-15 18:01:17,921 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-15 18:01:24,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3302620.0, ans=0.05 2024-08-15 18:01:26,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11450, loss[loss=0.08328, beats_loss=0.008803, ecapa_loss=0.0001626, whisper_loss=0.07285, over 19890.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.000151, whisper_loss=0.09127, over 3938625.41 frames. ], batch size: 77, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:01:46,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.318e+01 2.537e+01 2.814e+01 4.367e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-15 18:01:50,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3302820.0, ans=0.1 2024-08-15 18:01:55,622 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-15 18:02:23,101 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-15 18:02:23,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3303120.0, ans=0.125 2024-08-15 18:02:26,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3303120.0, ans=0.125 2024-08-15 18:02:35,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3303120.0, ans=0.0 2024-08-15 18:02:35,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3303120.0, ans=0.125 2024-08-15 18:02:38,995 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11500, loss[loss=0.1217, beats_loss=0.009464, ecapa_loss=0.0001459, whisper_loss=0.1108, over 19263.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0104, ecapa_loss=0.0001511, whisper_loss=0.09187, over 3936359.19 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:02:42,903 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.004e-01 2024-08-15 18:02:48,351 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-15 18:03:00,140 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-15 18:03:06,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2024-08-15 18:03:27,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3303520.0, ans=0.125 2024-08-15 18:03:27,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3303520.0, ans=0.0 2024-08-15 18:03:31,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-15 18:03:32,358 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 18:03:34,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3303520.0, ans=0.0 2024-08-15 18:03:35,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3303520.0, ans=0.125 2024-08-15 18:03:43,699 INFO [train_multi_KD3.py:844] (3/4) A total of 99 cuts. 22 from LS+wenet, 32 from Vox, 45 fro AS 2024-08-15 18:03:45,399 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 38 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-15 18:03:50,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-15 18:03:52,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11550, loss[loss=0.09845, beats_loss=0.01186, ecapa_loss=0.0001343, whisper_loss=0.08525, over 18452.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01038, ecapa_loss=0.0001509, whisper_loss=0.09193, over 3927019.07 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:04:02,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2024-08-15 18:04:03,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3303720.0, ans=0.0 2024-08-15 18:04:09,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3303820.0, ans=0.125 2024-08-15 18:04:09,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-08-15 18:04:12,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.433e+01 2.629e+01 2.861e+01 8.078e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-15 18:04:20,681 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-15 18:04:23,698 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-15 18:04:31,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-15 18:04:40,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-15 18:04:49,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3304020.0, ans=0.125 2024-08-15 18:05:08,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-08-15 18:05:08,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11600, loss[loss=0.1052, beats_loss=0.01116, ecapa_loss=0.0001318, whisper_loss=0.09273, over 21957.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01039, ecapa_loss=0.0001501, whisper_loss=0.09167, over 3938821.37 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:05:15,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3304220.0, ans=0.125 2024-08-15 18:05:18,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3304220.0, ans=0.1 2024-08-15 18:05:20,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3304220.0, ans=0.0 2024-08-15 18:05:55,302 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0450630709528923, model_norm_threshold=52.58251190185547 2024-08-15 18:05:55,486 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.906e+05, grad_sumsq=1.906e+05, orig_rms_sq=1.000e+00 2024-08-15 18:05:58,713 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-15 18:06:01,639 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-15 18:06:07,348 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-15 18:06:17,703 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-15 18:06:20,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11650, loss[loss=0.08628, beats_loss=0.01138, ecapa_loss=0.0001405, whisper_loss=0.07349, over 19991.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001498, whisper_loss=0.09114, over 3962744.67 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:06:20,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3304720.0, ans=0.0 2024-08-15 18:06:40,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.450e+01 2.772e+01 2.999e+01 1.167e+03, threshold=5.544e+01, percent-clipped=1.0 2024-08-15 18:06:45,176 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-15 18:06:52,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3304920.0, ans=0.0 2024-08-15 18:06:56,574 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-15 18:07:05,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3305020.0, ans=0.125 2024-08-15 18:07:11,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-15 18:07:13,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3305020.0, ans=0.0 2024-08-15 18:07:21,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-15 18:07:26,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3305120.0, ans=0.2 2024-08-15 18:07:30,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3305220.0, ans=0.0 2024-08-15 18:07:31,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11700, loss[loss=0.1152, beats_loss=0.01082, ecapa_loss=0.0001323, whisper_loss=0.1031, over 22551.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01052, ecapa_loss=0.0001492, whisper_loss=0.09196, over 3979752.35 frames. ], batch size: 87, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:07:31,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3305220.0, ans=0.1 2024-08-15 18:07:35,662 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 18:07:40,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3305220.0, ans=0.125 2024-08-15 18:07:48,496 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 18:08:14,426 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-15 18:08:18,787 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 18:08:26,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-08-15 18:08:28,719 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-15 18:08:30,034 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-15 18:08:43,483 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11750, loss[loss=0.09636, beats_loss=0.01326, ecapa_loss=0.0001317, whisper_loss=0.08179, over 21192.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01061, ecapa_loss=0.0001505, whisper_loss=0.09125, over 3978109.04 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:08:49,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-15 18:09:03,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.291e+01 2.526e+01 2.838e+01 3.948e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-15 18:09:07,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3305820.0, ans=0.05 2024-08-15 18:09:34,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3306020.0, ans=0.0 2024-08-15 18:09:43,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-15 18:09:46,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3306120.0, ans=0.95 2024-08-15 18:09:55,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11800, loss[loss=0.0847, beats_loss=0.01327, ecapa_loss=0.0001322, whisper_loss=0.07011, over 19859.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01066, ecapa_loss=0.0001502, whisper_loss=0.09079, over 3957539.86 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:09:57,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3306220.0, ans=0.04949747468305833 2024-08-15 18:10:07,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3306220.0, ans=0.1 2024-08-15 18:10:17,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2024-08-15 18:10:22,101 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-15 18:10:43,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3306520.0, ans=0.05 2024-08-15 18:10:57,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3306620.0, ans=0.125 2024-08-15 18:11:03,046 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 18:11:08,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11850, loss[loss=0.08537, beats_loss=0.0123, ecapa_loss=0.000147, whisper_loss=0.0716, over 16100.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001507, whisper_loss=0.09088, over 3951614.69 frames. ], batch size: 68, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:11:25,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3306820.0, ans=0.125 2024-08-15 18:11:28,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.292e+01 2.620e+01 2.942e+01 3.993e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-15 18:11:44,431 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 18:12:14,396 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-15 18:12:20,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11900, loss[loss=0.09532, beats_loss=0.007864, ecapa_loss=0.0001663, whisper_loss=0.0858, over 17603.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01062, ecapa_loss=0.0001498, whisper_loss=0.09114, over 3940049.45 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:12:24,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3307220.0, ans=0.2 2024-08-15 18:12:29,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=12.0 2024-08-15 18:12:34,471 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-15 18:12:37,446 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-15 18:12:45,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3307320.0, ans=0.5 2024-08-15 18:12:49,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3307420.0, ans=0.0 2024-08-15 18:13:08,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3307520.0, ans=0.125 2024-08-15 18:13:18,559 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 18:13:26,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3307620.0, ans=0.125 2024-08-15 18:13:31,578 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-15 18:13:33,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11950, loss[loss=0.1017, beats_loss=0.007056, ecapa_loss=0.0001781, whisper_loss=0.0929, over 14352.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001508, whisper_loss=0.09122, over 3899073.51 frames. ], batch size: 54, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:13:52,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.261e+01 2.658e+01 2.941e+01 1.221e+02, threshold=5.315e+01, percent-clipped=2.0 2024-08-15 18:14:21,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3308020.0, ans=0.125 2024-08-15 18:14:25,570 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-15 18:14:30,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3308120.0, ans=0.05 2024-08-15 18:14:35,807 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 18:14:39,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3308120.0, ans=0.125 2024-08-15 18:14:44,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12000, loss[loss=0.1115, beats_loss=0.007317, ecapa_loss=0.000184, whisper_loss=0.1023, over 15571.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001498, whisper_loss=0.09128, over 3910746.50 frames. ], batch size: 61, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:14:44,630 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-15 18:15:24,372 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0.2463, over 922467.00 frames. 2024-08-15 18:15:43,485 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004172, beats_loss=0, ecapa_loss=0.0004172, whisper_loss=0, over 939242.00 frames. 2024-08-15 18:17:41,600 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-15 18:17:41,604 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32143MB 2024-08-15 18:17:51,365 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-15 18:17:51,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3308220.0, ans=0.2 2024-08-15 18:17:54,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-15 18:18:00,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3308320.0, ans=0.0 2024-08-15 18:18:16,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3308420.0, ans=0.0 2024-08-15 18:18:36,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=12.0 2024-08-15 18:18:48,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3308620.0, ans=0.125 2024-08-15 18:18:55,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12050, loss[loss=0.1082, beats_loss=0.01132, ecapa_loss=0.0001246, whisper_loss=0.09563, over 22173.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001498, whisper_loss=0.09077, over 3896495.12 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:18:59,256 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-15 18:19:02,134 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 18:19:02,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3308720.0, ans=15.0 2024-08-15 18:19:05,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.16 vs. limit=10.0 2024-08-15 18:19:06,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3308720.0, ans=0.125 2024-08-15 18:19:13,907 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-15 18:19:16,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.282e+01 2.556e+01 2.851e+01 3.972e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-15 18:19:36,245 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-15 18:19:42,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=12.0 2024-08-15 18:19:46,870 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-15 18:19:48,297 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-15 18:19:56,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3309120.0, ans=0.125 2024-08-15 18:19:56,298 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-15 18:20:10,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12100, loss[loss=0.1189, beats_loss=0.00706, ecapa_loss=0.0002044, whisper_loss=0.1097, over 21881.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001488, whisper_loss=0.0913, over 3882963.54 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:20:10,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3309220.0, ans=0.125 2024-08-15 18:20:16,439 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-15 18:20:16,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3309220.0, ans=0.125 2024-08-15 18:20:30,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-15 18:20:39,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3309420.0, ans=0.125 2024-08-15 18:20:43,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2024-08-15 18:20:57,720 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-15 18:21:05,687 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:21:24,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12150, loss[loss=0.1043, beats_loss=0.01115, ecapa_loss=0.0001377, whisper_loss=0.09182, over 16903.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001481, whisper_loss=0.09083, over 3869459.34 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:21:40,563 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-15 18:21:46,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.187e+01 2.499e+01 2.897e+01 4.006e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-15 18:21:58,552 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-15 18:22:02,886 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-15 18:22:26,793 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-15 18:22:39,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12200, loss[loss=0.1213, beats_loss=0.00833, ecapa_loss=0.0001762, whisper_loss=0.1112, over 20659.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001476, whisper_loss=0.09108, over 3877791.48 frames. ], batch size: 85, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:23:17,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3310420.0, ans=0.0 2024-08-15 18:23:28,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3310520.0, ans=0.035 2024-08-15 18:23:48,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3310620.0, ans=0.125 2024-08-15 18:23:53,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-08-15 18:23:54,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12250, loss[loss=0.1188, beats_loss=0.008575, ecapa_loss=0.0001613, whisper_loss=0.1086, over 24413.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01049, ecapa_loss=0.0001496, whisper_loss=0.09166, over 3915166.62 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:24:07,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3310720.0, ans=0.125 2024-08-15 18:24:10,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310820.0, ans=0.1 2024-08-15 18:24:11,526 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 18:24:15,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.354e+01 2.587e+01 2.883e+01 9.186e+01, threshold=5.174e+01, percent-clipped=2.0 2024-08-15 18:24:16,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-15 18:24:32,449 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 24 from LS+wenet, 16 from Vox, 14 fro AS 2024-08-15 18:24:35,388 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-15 18:24:44,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3311020.0, ans=0.125 2024-08-15 18:25:07,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3311220.0, ans=0.125 2024-08-15 18:25:08,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12300, loss[loss=0.09521, beats_loss=0.0123, ecapa_loss=0.0001366, whisper_loss=0.08155, over 20319.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001497, whisper_loss=0.09144, over 3933945.48 frames. ], batch size: 83, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:25:09,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3311220.0, ans=0.0 2024-08-15 18:25:15,067 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-15 18:25:19,681 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-15 18:25:47,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3311420.0, ans=0.0 2024-08-15 18:26:19,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311620.0, ans=0.1 2024-08-15 18:26:24,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12350, loss[loss=0.09785, beats_loss=0.01092, ecapa_loss=0.0001591, whisper_loss=0.08534, over 20689.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001504, whisper_loss=0.09072, over 3907683.54 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:26:28,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3311720.0, ans=0.125 2024-08-15 18:26:29,216 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-15 18:26:36,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=12.0 2024-08-15 18:26:37,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3311720.0, ans=0.0 2024-08-15 18:26:44,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.423e+01 2.680e+01 3.098e+01 2.023e+02, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:26:45,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3311820.0, ans=0.125 2024-08-15 18:26:48,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3311820.0, ans=0.125 2024-08-15 18:26:53,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3311920.0, ans=0.2 2024-08-15 18:26:54,220 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 31 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-15 18:27:07,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3312020.0, ans=0.125 2024-08-15 18:27:28,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3312120.0, ans=0.0 2024-08-15 18:27:36,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3312120.0, ans=0.09899494936611666 2024-08-15 18:27:36,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3312120.0, ans=0.0 2024-08-15 18:27:38,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12400, loss[loss=0.1042, beats_loss=0.009412, ecapa_loss=0.0001456, whisper_loss=0.0933, over 17030.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001509, whisper_loss=0.09079, over 3923028.67 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:27:39,007 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-15 18:27:43,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3312220.0, ans=0.0 2024-08-15 18:27:57,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3312320.0, ans=0.2 2024-08-15 18:28:00,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312320.0, ans=0.1 2024-08-15 18:28:08,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3312420.0, ans=0.125 2024-08-15 18:28:28,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3312520.0, ans=0.2 2024-08-15 18:28:30,829 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-15 18:28:41,008 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.401e+00 2024-08-15 18:28:52,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12450, loss[loss=0.1242, beats_loss=0.009046, ecapa_loss=0.0001545, whisper_loss=0.1136, over 23391.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001508, whisper_loss=0.09045, over 3916024.13 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:28:56,264 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-15 18:29:13,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.349e+01 2.594e+01 2.911e+01 3.951e+02, threshold=5.187e+01, percent-clipped=3.0 2024-08-15 18:29:16,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3312820.0, ans=0.07 2024-08-15 18:29:58,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-15 18:30:07,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12500, loss[loss=0.102, beats_loss=0.01104, ecapa_loss=0.0001557, whisper_loss=0.08946, over 21571.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001493, whisper_loss=0.09029, over 3939949.60 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:30:07,897 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-15 18:30:14,105 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-15 18:30:18,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3313220.0, ans=0.0 2024-08-15 18:30:21,447 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 18:30:42,068 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-15 18:30:46,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-15 18:30:49,934 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-15 18:30:54,104 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-15 18:31:04,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3313520.0, ans=0.0 2024-08-15 18:31:14,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-15 18:31:21,730 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-15 18:31:23,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12550, loss[loss=0.1071, beats_loss=0.01035, ecapa_loss=0.0001772, whisper_loss=0.09496, over 16200.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.00015, whisper_loss=0.09066, over 3908859.51 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:31:29,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3313720.0, ans=0.125 2024-08-15 18:31:44,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.273e+01 2.491e+01 2.694e+01 3.703e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-15 18:32:01,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3313920.0, ans=0.125 2024-08-15 18:32:13,371 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-15 18:32:16,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3314020.0, ans=0.125 2024-08-15 18:32:23,752 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-15 18:32:28,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3314120.0, ans=0.125 2024-08-15 18:32:39,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12600, loss[loss=0.1206, beats_loss=0.01055, ecapa_loss=0.0001562, whisper_loss=0.1085, over 22857.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001511, whisper_loss=0.0908, over 3914283.70 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:32:53,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3314320.0, ans=0.125 2024-08-15 18:32:58,709 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08482968807220459, model_norm_threshold=49.81049346923828 2024-08-15 18:32:58,905 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.887e+04, grad_sumsq=6.887e+04, orig_rms_sq=1.000e+00 2024-08-15 18:33:02,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3314320.0, ans=0.2 2024-08-15 18:33:02,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3314320.0, ans=0.125 2024-08-15 18:33:02,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3314320.0, ans=0.125 2024-08-15 18:33:07,608 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-15 18:33:14,950 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-15 18:33:18,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3314420.0, ans=0.1 2024-08-15 18:33:53,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12650, loss[loss=0.1197, beats_loss=0.009399, ecapa_loss=0.0001469, whisper_loss=0.1088, over 18295.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001509, whisper_loss=0.0914, over 3921527.77 frames. ], batch size: 68, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:34:00,629 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-15 18:34:01,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-15 18:34:10,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-15 18:34:13,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.371e+01 2.618e+01 2.895e+01 5.872e+02, threshold=5.236e+01, percent-clipped=1.0 2024-08-15 18:34:15,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3314820.0, ans=0.0 2024-08-15 18:34:19,567 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-15 18:34:21,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3314920.0, ans=0.0 2024-08-15 18:34:23,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3314920.0, ans=0.125 2024-08-15 18:34:26,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3314920.0, ans=0.1 2024-08-15 18:34:27,567 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-15 18:34:34,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-08-15 18:34:45,243 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-15 18:35:00,064 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-15 18:35:07,059 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12700, loss[loss=0.1336, beats_loss=0.008516, ecapa_loss=0.0001598, whisper_loss=0.1235, over 23131.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01058, ecapa_loss=0.0001506, whisper_loss=0.09185, over 3927858.42 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:35:07,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315220.0, ans=0.1 2024-08-15 18:35:09,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3315220.0, ans=0.05 2024-08-15 18:36:03,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3315520.0, ans=0.1 2024-08-15 18:36:13,329 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-15 18:36:17,941 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-15 18:36:22,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12750, loss[loss=0.1108, beats_loss=0.01023, ecapa_loss=0.0001108, whisper_loss=0.09946, over 16145.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001504, whisper_loss=0.09156, over 3928846.20 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:36:35,005 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:36:43,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.562e+01 2.837e+01 4.631e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-15 18:36:45,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3315820.0, ans=0.125 2024-08-15 18:36:45,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-08-15 18:36:50,685 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-15 18:36:58,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3315920.0, ans=0.1 2024-08-15 18:37:14,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3316020.0, ans=0.0 2024-08-15 18:37:17,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-15 18:37:20,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3316120.0, ans=0.0 2024-08-15 18:37:30,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3316120.0, ans=0.125 2024-08-15 18:37:36,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12800, loss[loss=0.09638, beats_loss=0.01195, ecapa_loss=0.0001442, whisper_loss=0.08299, over 21245.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01063, ecapa_loss=0.00015, whisper_loss=0.09141, over 3929043.99 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:37:44,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2024-08-15 18:37:54,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3316320.0, ans=0.125 2024-08-15 18:37:58,110 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-15 18:38:06,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3316420.0, ans=0.1 2024-08-15 18:38:12,189 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-15 18:38:39,475 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-15 18:38:47,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3316620.0, ans=0.0 2024-08-15 18:38:51,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316720.0, ans=0.1 2024-08-15 18:38:52,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12850, loss[loss=0.111, beats_loss=0.01066, ecapa_loss=0.000124, whisper_loss=0.0991, over 20273.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001516, whisper_loss=0.091, over 3882518.81 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:38:54,557 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-15 18:39:00,915 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-15 18:39:11,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-15 18:39:11,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3316820.0, ans=0.0 2024-08-15 18:39:13,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.324e+01 2.629e+01 2.874e+01 4.372e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-15 18:39:27,553 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-15 18:39:29,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3316920.0, ans=0.0 2024-08-15 18:39:34,837 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-15 18:39:37,670 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-15 18:39:40,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3317020.0, ans=0.1 2024-08-15 18:40:07,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12900, loss[loss=0.1036, beats_loss=0.01154, ecapa_loss=0.0001643, whisper_loss=0.09046, over 21690.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001507, whisper_loss=0.09062, over 3829195.08 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 2.8823037615171174e+17 2024-08-15 18:40:22,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3317320.0, ans=10.0 2024-08-15 18:40:44,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3317420.0, ans=0.125 2024-08-15 18:40:49,016 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-15 18:40:53,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3317520.0, ans=0.0 2024-08-15 18:41:21,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 12950, loss[loss=0.1061, beats_loss=0.01141, ecapa_loss=0.0001252, whisper_loss=0.09346, over 21831.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0105, ecapa_loss=0.0001503, whisper_loss=0.09167, over 3843986.96 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:41:26,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3317720.0, ans=0.0 2024-08-15 18:41:31,190 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-15 18:41:39,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3317820.0, ans=0.125 2024-08-15 18:41:40,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.294e+01 2.551e+01 2.879e+01 4.880e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-15 18:42:34,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13000, loss[loss=0.1014, beats_loss=0.009348, ecapa_loss=0.0001727, whisper_loss=0.09031, over 17824.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001506, whisper_loss=0.09088, over 3863405.79 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:42:37,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.57 vs. limit=10.0 2024-08-15 18:42:42,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3318220.0, ans=0.125 2024-08-15 18:42:42,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3318220.0, ans=0.125 2024-08-15 18:43:11,971 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-15 18:43:15,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.09 vs. limit=6.0 2024-08-15 18:43:41,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3318620.0, ans=0.125 2024-08-15 18:43:48,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13050, loss[loss=0.1054, beats_loss=0.01078, ecapa_loss=0.0001644, whisper_loss=0.09293, over 21851.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001496, whisper_loss=0.09023, over 3844523.32 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:44:01,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-15 18:44:09,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.022e+01 2.364e+01 2.592e+01 2.928e+01 7.191e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-15 18:44:13,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3318820.0, ans=0.0 2024-08-15 18:44:21,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3318920.0, ans=0.2 2024-08-15 18:44:30,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3318920.0, ans=0.0 2024-08-15 18:44:55,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3319120.0, ans=0.0 2024-08-15 18:45:01,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-15 18:45:01,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13100, loss[loss=0.1074, beats_loss=0.01027, ecapa_loss=0.0001664, whisper_loss=0.09549, over 22583.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01073, ecapa_loss=0.0001475, whisper_loss=0.08962, over 3848456.09 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:45:05,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3319220.0, ans=0.125 2024-08-15 18:45:10,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-15 18:45:11,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3319220.0, ans=0.0 2024-08-15 18:45:45,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2024-08-15 18:45:53,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319520.0, ans=0.1 2024-08-15 18:45:56,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319520.0, ans=0.1 2024-08-15 18:46:06,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3319620.0, ans=0.2 2024-08-15 18:46:12,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3319620.0, ans=0.2 2024-08-15 18:46:19,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13150, loss[loss=0.08588, beats_loss=0.01292, ecapa_loss=0.000109, whisper_loss=0.07187, over 14207.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001485, whisper_loss=0.08969, over 3820226.04 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:46:19,855 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-15 18:46:24,444 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-15 18:46:36,858 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-15 18:46:41,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.374e+01 2.573e+01 2.884e+01 4.147e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-15 18:46:46,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3319820.0, ans=0.02 2024-08-15 18:46:50,492 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-15 18:46:59,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3319920.0, ans=0.0 2024-08-15 18:47:24,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-15 18:47:33,597 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 18:47:43,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13200, loss[loss=0.1137, beats_loss=0.007922, ecapa_loss=0.0001649, whisper_loss=0.1041, over 15824.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001486, whisper_loss=0.09002, over 3832535.01 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:47:45,447 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-15 18:47:46,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2024-08-15 18:47:52,326 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 27 from Vox, 16 fro AS 2024-08-15 18:47:58,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3320220.0, ans=0.125 2024-08-15 18:48:00,746 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-15 18:48:11,238 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-15 18:48:11,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320320.0, ans=0.1 2024-08-15 18:48:14,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3320320.0, ans=0.0 2024-08-15 18:48:15,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3320420.0, ans=0.0 2024-08-15 18:48:29,236 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-15 18:48:31,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-08-15 18:48:36,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3320520.0, ans=0.125 2024-08-15 18:48:43,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-15 18:49:06,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13250, loss[loss=0.09698, beats_loss=0.009097, ecapa_loss=0.0001671, whisper_loss=0.08622, over 22303.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001483, whisper_loss=0.09091, over 3874197.07 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:49:07,715 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-15 18:49:30,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.301e+01 2.680e+01 3.189e+01 5.288e+01, threshold=5.359e+01, percent-clipped=1.0 2024-08-15 18:49:37,880 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-15 18:49:39,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320920.0, ans=0.1 2024-08-15 18:50:00,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3321020.0, ans=0.0 2024-08-15 18:50:22,859 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-15 18:50:28,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3321220.0, ans=0.125 2024-08-15 18:50:29,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13300, loss[loss=0.1203, beats_loss=0.009838, ecapa_loss=0.0001607, whisper_loss=0.1089, over 22923.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.09049, over 3862854.23 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:50:48,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3321320.0, ans=0.09899494936611666 2024-08-15 18:51:06,905 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-15 18:51:19,027 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-15 18:51:51,335 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-15 18:51:55,936 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13350, loss[loss=0.08766, beats_loss=0.009468, ecapa_loss=0.0001423, whisper_loss=0.07677, over 14988.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001488, whisper_loss=0.09062, over 3870257.75 frames. ], batch size: 56, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:52:05,818 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-15 18:52:11,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3321720.0, ans=0.0 2024-08-15 18:52:20,099 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-15 18:52:21,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.271e+01 2.660e+01 2.983e+01 5.401e+01, threshold=5.319e+01, percent-clipped=1.0 2024-08-15 18:52:22,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=12.0 2024-08-15 18:52:24,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-15 18:52:32,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3321920.0, ans=0.0 2024-08-15 18:52:39,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3321920.0, ans=0.125 2024-08-15 18:52:42,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.66 vs. limit=5.0 2024-08-15 18:52:50,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3322020.0, ans=0.2 2024-08-15 18:53:17,319 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-15 18:53:23,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13400, loss[loss=0.1068, beats_loss=0.01096, ecapa_loss=0.0001233, whisper_loss=0.09464, over 23083.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.000148, whisper_loss=0.09043, over 3871451.47 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:53:29,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3322220.0, ans=0.125 2024-08-15 18:53:41,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=12.0 2024-08-15 18:53:54,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-15 18:54:31,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3322620.0, ans=0.0 2024-08-15 18:54:36,083 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-15 18:54:36,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.071e-01 2024-08-15 18:54:50,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13450, loss[loss=0.1088, beats_loss=0.009818, ecapa_loss=0.0001391, whisper_loss=0.09755, over 23166.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001484, whisper_loss=0.09004, over 3904881.34 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:55:04,723 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-15 18:55:08,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3322820.0, ans=0.0 2024-08-15 18:55:14,893 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.424e+01 2.655e+01 2.945e+01 1.400e+03, threshold=5.311e+01, percent-clipped=0.0 2024-08-15 18:55:14,893 WARNING [optim.py:496] (3/4) Scaling gradients by 0.037934403866529465, model_norm_threshold=53.10542297363281 2024-08-15 18:55:15,079 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.249e+05, grad_sumsq=5.178e+07, orig_rms_sq=1.014e-02 2024-08-15 18:55:15,513 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-15 18:55:16,841 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-15 18:55:17,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322820.0, ans=0.125 2024-08-15 18:55:26,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2024-08-15 18:56:16,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13500, loss[loss=0.1068, beats_loss=0.01034, ecapa_loss=0.0001204, whisper_loss=0.09527, over 20217.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001508, whisper_loss=0.08988, over 3914828.15 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:56:18,409 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-15 18:56:25,445 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-15 18:56:33,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3323320.0, ans=0.1 2024-08-15 18:56:38,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-08-15 18:56:39,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-15 18:56:41,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-08-15 18:56:45,910 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-15 18:57:24,143 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-15 18:57:35,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3323620.0, ans=0.2 2024-08-15 18:57:44,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13550, loss[loss=0.1032, beats_loss=0.01213, ecapa_loss=0.0001505, whisper_loss=0.08956, over 21257.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001503, whisper_loss=0.08997, over 3903033.88 frames. ], batch size: 85, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:57:46,558 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-15 18:57:48,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3323720.0, ans=0.125 2024-08-15 18:58:01,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2024-08-15 18:58:08,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.273e+01 2.505e+01 2.907e+01 8.129e+01, threshold=5.010e+01, percent-clipped=4.0 2024-08-15 18:58:46,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3324020.0, ans=0.125 2024-08-15 18:58:56,458 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-15 18:59:10,022 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13600, loss[loss=0.1207, beats_loss=0.009449, ecapa_loss=0.0001636, whisper_loss=0.1097, over 23071.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.00015, whisper_loss=0.091, over 3903483.47 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 18:59:52,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3324420.0, ans=0.0 2024-08-15 19:00:08,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3324520.0, ans=0.125 2024-08-15 19:00:34,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13650, loss[loss=0.09512, beats_loss=0.01085, ecapa_loss=0.000144, whisper_loss=0.08283, over 14129.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001499, whisper_loss=0.09141, over 3892280.38 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:00:35,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3324720.0, ans=0.1 2024-08-15 19:00:49,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3324720.0, ans=0.125 2024-08-15 19:00:58,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.294e+01 2.538e+01 2.832e+01 8.240e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-15 19:01:03,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3324820.0, ans=0.125 2024-08-15 19:01:10,726 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-15 19:01:12,116 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-15 19:01:22,569 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-15 19:01:37,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3325020.0, ans=0.125 2024-08-15 19:01:48,174 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-15 19:01:49,855 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-15 19:01:59,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13700, loss[loss=0.09505, beats_loss=0.008886, ecapa_loss=0.0001326, whisper_loss=0.08483, over 16739.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.09116, over 3870989.48 frames. ], batch size: 63, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 19:02:23,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3325320.0, ans=0.125 2024-08-15 19:02:24,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3325320.0, ans=0.2 2024-08-15 19:02:30,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3325320.0, ans=0.2 2024-08-15 19:02:30,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325320.0, ans=0.1 2024-08-15 19:02:31,308 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-15 19:02:31,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3325320.0, ans=0.125 2024-08-15 19:02:45,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=22.5 2024-08-15 19:31:39,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3325620.0, ans=0.2 2024-08-15 19:45:37,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3325620.0, ans=0.0 2024-08-15 19:59:43,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13750, loss[loss=0.1033, beats_loss=0.01153, ecapa_loss=0.0001315, whisper_loss=0.09046, over 16722.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001491, whisper_loss=0.09125, over 3881074.79 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 20:44:29,341 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-15 20:47:28,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.373e+01 2.597e+01 2.828e+01 1.512e+02, threshold=5.195e+01, percent-clipped=2.0 2024-08-15 22:06:33,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-08-15 22:18:27,945 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-15 22:26:23,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3326120.0, ans=0.2 2024-08-15 22:40:27,672 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-15 22:41:52,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13800, loss[loss=0.09357, beats_loss=0.009637, ecapa_loss=0.0001654, whisper_loss=0.08228, over 22943.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001499, whisper_loss=0.09098, over 3886115.88 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-15 23:01:45,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3326220.0, ans=0.09899494936611666 2024-08-15 23:37:17,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3326320.0, ans=0.0 2024-08-16 00:19:55,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3326520.0, ans=0.2 2024-08-16 00:22:30,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3326520.0, ans=0.125 2024-08-16 00:27:04,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3326520.0, ans=0.2 2024-08-16 00:33:49,566 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-16 00:36:12,865 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-16 01:12:17,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13850, loss[loss=0.1129, beats_loss=0.009983, ecapa_loss=0.0001391, whisper_loss=0.1015, over 23282.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01062, ecapa_loss=0.0001484, whisper_loss=0.09086, over 3889040.99 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 01:25:30,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=22.5 2024-08-16 01:48:18,318 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-16 01:55:15,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.228e+01 2.445e+01 2.757e+01 2.786e+02, threshold=4.891e+01, percent-clipped=1.0 2024-08-16 02:25:19,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3326920.0, ans=0.125 2024-08-16 02:40:28,167 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-16 02:54:32,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3327020.0, ans=0.125 2024-08-16 03:47:18,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13900, loss[loss=0.1115, beats_loss=0.01019, ecapa_loss=0.0001556, whisper_loss=0.0998, over 22164.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001494, whisper_loss=0.09096, over 3929922.36 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 03:48:48,467 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-16 04:54:22,985 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-16 04:57:15,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3327420.0, ans=0.1 2024-08-16 05:03:45,935 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-16 05:15:28,440 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-16 06:16:02,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 13950, loss[loss=0.1197, beats_loss=0.009525, ecapa_loss=0.0001419, whisper_loss=0.1088, over 16737.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001485, whisper_loss=0.09068, over 3896201.78 frames. ], batch size: 64, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 06:23:01,669 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-16 06:54:17,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3327820.0, ans=0.125 2024-08-16 06:57:05,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3327820.0, ans=0.125 2024-08-16 06:57:05,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.387e+01 2.658e+01 3.040e+01 4.567e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-16 07:04:05,487 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.973e+00 2024-08-16 07:09:26,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3327820.0, ans=0.125 2024-08-16 07:22:45,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3327920.0, ans=0.125 2024-08-16 07:37:49,802 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-16 07:41:47,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3327920.0, ans=0.125 2024-08-16 07:48:11,550 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-16 08:01:47,398 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-16 08:11:40,596 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-16 08:41:08,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-08-16 09:02:46,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 14000, loss[loss=0.1005, beats_loss=0.01246, ecapa_loss=0.0001245, whisper_loss=0.08676, over 22820.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001473, whisper_loss=0.09016, over 3901547.37 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-16 09:21:35,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3328220.0, ans=0.0 2024-08-16 09:27:07,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3328220.0, ans=0.0 2024-08-16 09:28:55,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3328320.0, ans=0.125 2024-08-16 09:41:00,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328320.0, ans=0.1 2024-08-16 09:43:01,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-16 09:44:13,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3328320.0, ans=0.0 2024-08-16 10:22:35,706 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-16 10:28:58,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3328520.0, ans=0.0 2024-08-16 10:35:09,950 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 15 from Vox, 36 fro AS