2024-08-19 16:40:02,058 INFO [train_multi_KD3.py:1188] (2/4) Training started 2024-08-19 16:40:02,058 INFO [train_multi_KD3.py:1198] (2/4) Device: cuda:2 2024-08-19 16:40:02,058 INFO [train_multi_KD3.py:1214] (2/4) Using dtype=torch.bfloat16 2024-08-19 16:40:02,058 INFO [train_multi_KD3.py:1216] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '3210a8ed-dirty', 'icefall-git-date': 'Mon Aug 19 16:16:48 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-19 16:40:02,058 INFO [train_multi_KD3.py:1218] (2/4) About to create model 2024-08-19 16:40:02,411 INFO [model_shift.py:142] (2/4) Delta_t: 6 when computing the distillation loss 2024-08-19 16:40:02,416 INFO [train_multi_KD3.py:1222] (2/4) Number of model parameters: 66484678 2024-08-19 16:40:02,416 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-30.pt 2024-08-19 16:40:04,670 INFO [train_multi_KD3.py:1237] (2/4) Using DDP 2024-08-19 16:40:06,069 INFO [train_multi_KD3.py:1249] (2/4) Loading optimizer state dict 2024-08-19 16:40:06,341 INFO [train_multi_KD3.py:1257] (2/4) Loading scheduler state dict 2024-08-19 16:40:06,342 INFO [kd_datamodule.py:690] (2/4) About to get train 960 cuts 2024-08-19 16:40:06,384 INFO [kd_datamodule.py:862] (2/4) About to get the voxceleb cuts. 2024-08-19 16:40:06,385 INFO [kd_datamodule.py:873] (2/4) Adding voxceleb2 cuts. 2024-08-19 16:40:06,387 INFO [train_multi_KD3.py:1320] (2/4) Getting audioset cuts 2024-08-19 16:40:06,387 INFO [kd_datamodule.py:881] (2/4) About to get the audioset cuts for KD. 2024-08-19 16:40:06,389 INFO [train_multi_KD3.py:1326] (2/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-19 16:40:14,352 INFO [train_multi_KD3.py:1328] (2/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1187704) [underlying data type: ], CutSet(len=1904746) [underlying data type: ]] 2024-08-19 16:40:14,352 INFO [train_multi_KD3.py:1329] (2/4) Using weights: [1406195, 1187704, 1904746] 2024-08-19 16:40:14,352 INFO [train_multi_KD3.py:1338] (2/4) CutSet(len=4498645) [underlying data type: ] 2024-08-19 16:40:14,352 INFO [kd_datamodule.py:449] (2/4) Disable MUSAN 2024-08-19 16:40:14,354 INFO [kd_datamodule.py:489] (2/4) Disable SpecAugment 2024-08-19 16:40:14,354 INFO [kd_datamodule.py:491] (2/4) About to create train dataset 2024-08-19 16:40:14,354 INFO [kd_datamodule.py:528] (2/4) Using SimpleCutSampler 2024-08-19 16:40:14,354 INFO [kd_datamodule.py:536] (2/4) About to create train dataloader 2024-08-19 16:40:14,356 INFO [kd_datamodule.py:756] (2/4) About to get dev-clean cuts 2024-08-19 16:40:14,358 INFO [kd_datamodule.py:774] (2/4) About to get dev-other cuts 2024-08-19 16:40:14,359 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-19 16:40:14,640 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-19 16:40:14,640 INFO [kd_datamodule.py:833] (2/4) About to get the test set of voxceleb1 set. 2024-08-19 16:40:14,641 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-19 16:40:14,871 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-19 16:40:14,871 INFO [kd_datamodule.py:893] (2/4) About to get the audioset eval cuts. 2024-08-19 16:40:14,872 INFO [kd_datamodule.py:570] (2/4) About to create dev dataset 2024-08-19 16:40:15,377 INFO [kd_datamodule.py:591] (2/4) About to create dev dataloader 2024-08-19 16:40:15,377 INFO [train_multi_KD3.py:1418] (2/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-19 16:40:15,377 INFO [train_multi_KD3.py:1422] (2/4) Loading grad scaler state dict 2024-08-19 16:40:31,725 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 0, loss[loss=0.08721, beats_loss=0.01125, ecapa_loss=0.000119, whisper_loss=0.07477, over 21476.00 frames. ], tot_loss[loss=0.08721, beats_loss=0.01125, ecapa_loss=0.000119, whisper_loss=0.07477, over 21476.00 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:40:31,725 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-19 16:41:05,543 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2478, over 931116.00 frames. 2024-08-19 16:41:25,477 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003992, beats_loss=0, ecapa_loss=0.0003992, whisper_loss=0, over 944235.00 frames. 2024-08-19 16:42:59,858 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 16:42:59,860 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-19 16:43:00,095 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 14 from LS+wenet, 28 from Vox, 49 fro AS 2024-08-19 16:43:01,915 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-19 16:43:44,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4445990.0, ans=0.125 2024-08-19 16:43:55,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4446090.0, ans=0.125 2024-08-19 16:44:43,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.746e+01 3.133e+01 8.282e+01, threshold=5.492e+01, percent-clipped=1.0 2024-08-19 16:44:53,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4446290.0, ans=0.2 2024-08-19 16:45:00,514 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 50, loss[loss=0.109, beats_loss=0.00824, ecapa_loss=0.0001501, whisper_loss=0.09927, over 22448.00 frames. ], tot_loss[loss=0.09741, beats_loss=0.00926, ecapa_loss=0.0001497, whisper_loss=0.08665, over 854334.22 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:45:09,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2024-08-19 16:45:11,367 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4446390.0, ans=0.04949747468305833 2024-08-19 16:45:18,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4446390.0, ans=0.125 2024-08-19 16:45:24,283 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 21 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-19 16:45:30,925 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 16:45:44,482 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 16:45:44,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4446590.0, ans=0.07 2024-08-19 16:46:55,260 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 100, loss[loss=0.1073, beats_loss=0.008185, ecapa_loss=0.0001533, whisper_loss=0.09758, over 19229.00 frames. ], tot_loss[loss=0.09714, beats_loss=0.009453, ecapa_loss=0.0001473, whisper_loss=0.08622, over 1491901.98 frames. ], batch size: 77, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:46:55,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4446890.0, ans=0.125 2024-08-19 16:48:03,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4447190.0, ans=0.07 2024-08-19 16:48:07,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4447190.0, ans=0.125 2024-08-19 16:48:24,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.615e+01 2.787e+01 3.101e+01 5.493e+01, threshold=5.575e+01, percent-clipped=1.0 2024-08-19 16:48:27,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4447290.0, ans=0.2 2024-08-19 16:48:40,688 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 150, loss[loss=0.07508, beats_loss=0.009873, ecapa_loss=0.0001311, whisper_loss=0.0639, over 14254.00 frames. ], tot_loss[loss=0.09902, beats_loss=0.009374, ecapa_loss=0.000146, whisper_loss=0.08818, over 1989628.14 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:49:03,286 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 16:49:10,452 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 16:49:14,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4447490.0, ans=0.0 2024-08-19 16:49:19,073 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 16:49:37,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4447690.0, ans=0.0 2024-08-19 16:49:58,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4447790.0, ans=0.0 2024-08-19 16:50:13,649 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 200, loss[loss=0.1081, beats_loss=0.009685, ecapa_loss=0.000151, whisper_loss=0.09691, over 23923.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009547, ecapa_loss=0.0001457, whisper_loss=0.08931, over 2407556.47 frames. ], batch size: 94, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:50:26,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4447890.0, ans=0.125 2024-08-19 16:50:28,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4447890.0, ans=0.0 2024-08-19 16:50:31,529 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 16:50:46,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-19 16:50:51,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-08-19 16:51:11,540 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 16 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-19 16:51:16,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4448190.0, ans=0.0 2024-08-19 16:51:21,529 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 16:51:24,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.343e+01 2.586e+01 2.857e+01 5.487e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-19 16:51:32,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-19 16:51:33,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4448290.0, ans=0.1 2024-08-19 16:51:37,997 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 250, loss[loss=0.1128, beats_loss=0.009165, ecapa_loss=0.0001651, whisper_loss=0.102, over 17920.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009729, ecapa_loss=0.0001444, whisper_loss=0.08961, over 2693782.26 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:51:53,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4448490.0, ans=0.0 2024-08-19 16:51:59,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4448490.0, ans=0.125 2024-08-19 16:52:25,412 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 32 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 16:52:43,545 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 16:52:48,751 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 16:53:03,192 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 300, loss[loss=0.09868, beats_loss=0.01029, ecapa_loss=0.0001199, whisper_loss=0.08719, over 21323.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.009995, ecapa_loss=0.0001422, whisper_loss=0.08877, over 2912529.99 frames. ], batch size: 83, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:53:06,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4448890.0, ans=0.025 2024-08-19 16:53:09,524 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-19 16:53:14,612 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 16:53:32,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4448990.0, ans=0.1 2024-08-19 16:53:37,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4449090.0, ans=0.0 2024-08-19 16:53:46,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4449090.0, ans=0.125 2024-08-19 16:53:49,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4449190.0, ans=0.0 2024-08-19 16:53:51,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2024-08-19 16:53:55,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4449190.0, ans=0.1 2024-08-19 16:54:08,196 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 16:54:11,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.292e+01 2.578e+01 2.922e+01 3.653e+02, threshold=5.156e+01, percent-clipped=3.0 2024-08-19 16:54:23,843 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 350, loss[loss=0.1056, beats_loss=0.00949, ecapa_loss=0.0001395, whisper_loss=0.09469, over 19238.00 frames. ], tot_loss[loss=0.09946, beats_loss=0.0103, ecapa_loss=0.0001422, whisper_loss=0.08774, over 3094709.53 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:54:33,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4449390.0, ans=0.125 2024-08-19 16:54:53,067 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:55:21,191 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 17 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 16:55:29,262 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 16:55:36,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4449790.0, ans=0.125 2024-08-19 16:55:39,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4449790.0, ans=0.1 2024-08-19 16:55:42,103 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 16:55:43,247 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 400, loss[loss=0.0894, beats_loss=0.01075, ecapa_loss=0.0001683, whisper_loss=0.07697, over 17774.00 frames. ], tot_loss[loss=0.09913, beats_loss=0.01028, ecapa_loss=0.0001418, whisper_loss=0.08744, over 3227712.56 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:56:00,620 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 16:56:06,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4449990.0, ans=0.0 2024-08-19 16:56:07,996 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 16:56:25,053 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2024-08-19 16:56:27,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4450090.0, ans=0.125 2024-08-19 16:56:31,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4450190.0, ans=0.1 2024-08-19 16:56:38,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-08-19 16:56:40,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4450190.0, ans=0.0 2024-08-19 16:56:44,548 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 16:56:52,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.167e+01 2.459e+01 2.674e+01 3.849e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 16:56:53,863 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 16:57:03,515 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08908264338970184, model_norm_threshold=49.18006896972656 2024-08-19 16:57:03,679 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.931e+04, grad_sumsq=3.931e+04, orig_rms_sq=1.000e+00 2024-08-19 16:57:05,398 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 450, loss[loss=0.08455, beats_loss=0.01268, ecapa_loss=0.0001417, whisper_loss=0.07046, over 21519.00 frames. ], tot_loss[loss=0.09942, beats_loss=0.01033, ecapa_loss=0.0001414, whisper_loss=0.08767, over 3357818.84 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:57:55,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4450690.0, ans=0.125 2024-08-19 16:58:13,512 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 16:58:21,806 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 16:58:28,190 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 500, loss[loss=0.1014, beats_loss=0.009276, ecapa_loss=0.0001699, whisper_loss=0.09045, over 19332.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01028, ecapa_loss=0.000141, whisper_loss=0.08863, over 3460729.41 frames. ], batch size: 79, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:58:37,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4450890.0, ans=0.2 2024-08-19 16:58:54,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4450990.0, ans=0.2 2024-08-19 16:59:01,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.74 vs. limit=22.5 2024-08-19 16:59:14,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4451090.0, ans=0.125 2024-08-19 16:59:36,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.363e+01 2.706e+01 3.045e+01 5.521e+02, threshold=5.412e+01, percent-clipped=1.0 2024-08-19 16:59:40,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4451290.0, ans=0.0 2024-08-19 16:59:49,871 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 550, loss[loss=0.0696, beats_loss=0.009805, ecapa_loss=0.0001371, whisper_loss=0.05842, over 16352.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01022, ecapa_loss=0.0001402, whisper_loss=0.08911, over 3505966.16 frames. ], batch size: 64, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:59:55,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4451390.0, ans=0.05 2024-08-19 17:00:15,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4451490.0, ans=0.0 2024-08-19 17:00:18,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4451490.0, ans=0.1 2024-08-19 17:00:34,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4451590.0, ans=0.125 2024-08-19 17:00:38,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4451590.0, ans=0.2 2024-08-19 17:00:40,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4451590.0, ans=0.125 2024-08-19 17:00:57,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4451690.0, ans=0.125 2024-08-19 17:00:59,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4451690.0, ans=0.2 2024-08-19 17:01:19,515 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-19 17:01:27,126 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 600, loss[loss=0.1093, beats_loss=0.009524, ecapa_loss=0.0001516, whisper_loss=0.09823, over 18568.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001409, whisper_loss=0.08896, over 3579343.24 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:01:33,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4451890.0, ans=0.0 2024-08-19 17:01:37,229 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 17:01:39,081 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 17:02:00,306 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 17:02:07,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4452090.0, ans=0.1 2024-08-19 17:02:10,042 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 17:02:17,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4452190.0, ans=0.125 2024-08-19 17:02:18,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-19 17:02:20,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4452190.0, ans=0.125 2024-08-19 17:02:38,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.243e+01 2.476e+01 2.748e+01 6.280e+01, threshold=4.953e+01, percent-clipped=2.0 2024-08-19 17:02:40,302 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 24 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-19 17:02:46,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.75 vs. limit=22.5 2024-08-19 17:02:53,185 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 650, loss[loss=0.1145, beats_loss=0.009146, ecapa_loss=0.0001561, whisper_loss=0.1038, over 21746.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0102, ecapa_loss=0.0001422, whisper_loss=0.08906, over 3647474.58 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:03:14,400 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 17:03:20,865 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 17:03:46,152 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 17:03:55,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4452690.0, ans=0.125 2024-08-19 17:03:58,543 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 17:04:15,504 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 17:04:17,930 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 700, loss[loss=0.1133, beats_loss=0.008499, ecapa_loss=0.0001331, whisper_loss=0.1034, over 15348.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01021, ecapa_loss=0.0001419, whisper_loss=0.08995, over 3690121.95 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:04:21,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4452890.0, ans=0.125 2024-08-19 17:04:24,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4452890.0, ans=0.1 2024-08-19 17:04:55,330 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 15 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-19 17:05:27,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.284e+01 2.551e+01 2.853e+01 6.068e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-19 17:05:31,379 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-19 17:05:37,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4453290.0, ans=0.125 2024-08-19 17:05:41,199 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 750, loss[loss=0.09758, beats_loss=0.01031, ecapa_loss=0.0001365, whisper_loss=0.08591, over 19252.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01024, ecapa_loss=0.0001406, whisper_loss=0.08991, over 3700921.05 frames. ], batch size: 78, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:05:41,409 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 17:06:16,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-19 17:06:27,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-19 17:06:29,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 17:06:45,515 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-19 17:07:07,285 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 800, loss[loss=0.1088, beats_loss=0.00786, ecapa_loss=0.0001553, whisper_loss=0.09939, over 18248.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01029, ecapa_loss=0.0001397, whisper_loss=0.08967, over 3727649.90 frames. ], batch size: 70, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:07:11,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4453890.0, ans=0.1 2024-08-19 17:07:14,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4453890.0, ans=0.125 2024-08-19 17:07:28,052 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 17:07:28,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4453990.0, ans=0.07 2024-08-19 17:07:34,867 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 17:07:37,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4453990.0, ans=0.04949747468305833 2024-08-19 17:07:38,411 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 17:07:43,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4454090.0, ans=0.2 2024-08-19 17:07:52,357 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4454090.0, ans=0.125 2024-08-19 17:08:07,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4454190.0, ans=0.125 2024-08-19 17:08:09,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4454190.0, ans=0.2 2024-08-19 17:08:19,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.286e+01 2.525e+01 2.905e+01 4.318e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 17:08:19,951 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 12 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 17:08:32,951 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 850, loss[loss=0.1001, beats_loss=0.01015, ecapa_loss=0.0001371, whisper_loss=0.08855, over 17123.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.0001393, whisper_loss=0.08848, over 3738358.45 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:08:34,740 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 17:08:40,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4454390.0, ans=0.125 2024-08-19 17:09:28,280 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.595e-01 2024-08-19 17:09:31,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4454690.0, ans=0.1 2024-08-19 17:09:34,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-19 17:09:35,589 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:09:54,975 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-19 17:09:59,485 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 900, loss[loss=0.08713, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.07516, over 18782.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01029, ecapa_loss=0.0001395, whisper_loss=0.08833, over 3727159.35 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:10:06,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4454890.0, ans=0.035 2024-08-19 17:10:29,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4454990.0, ans=0.125 2024-08-19 17:10:41,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-19 17:10:46,558 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 17:10:50,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4455190.0, ans=0.0 2024-08-19 17:10:52,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4455190.0, ans=0.2 2024-08-19 17:10:53,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4455190.0, ans=0.125 2024-08-19 17:10:55,231 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:11:06,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=22.5 2024-08-19 17:11:09,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4455290.0, ans=0.0 2024-08-19 17:11:09,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4455290.0, ans=0.0 2024-08-19 17:11:11,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.270e+01 2.580e+01 3.222e+01 2.488e+02, threshold=5.161e+01, percent-clipped=3.0 2024-08-19 17:11:24,786 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 950, loss[loss=0.1113, beats_loss=0.009191, ecapa_loss=0.0001693, whisper_loss=0.1004, over 22022.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.08828, over 3774206.00 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:11:56,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4455590.0, ans=0.125 2024-08-19 17:11:58,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:12:00,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4455590.0, ans=0.1 2024-08-19 17:12:02,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-19 17:12:06,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4455590.0, ans=0.125 2024-08-19 17:12:08,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:12:11,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4455590.0, ans=0.125 2024-08-19 17:12:16,682 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 17:12:25,111 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 17:12:41,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4455790.0, ans=0.2 2024-08-19 17:12:49,077 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1000, loss[loss=0.1184, beats_loss=0.008584, ecapa_loss=0.0001284, whisper_loss=0.1085, over 20619.00 frames. ], tot_loss[loss=0.09944, beats_loss=0.01042, ecapa_loss=0.0001379, whisper_loss=0.08764, over 3767429.65 frames. ], batch size: 77, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:12:58,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-19 17:13:00,445 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-08-19 17:13:03,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4455890.0, ans=0.0 2024-08-19 17:13:15,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4455990.0, ans=0.0 2024-08-19 17:13:23,700 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 17:13:23,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4456090.0, ans=0.125 2024-08-19 17:13:45,652 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 17:13:58,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4456290.0, ans=0.0 2024-08-19 17:13:59,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.231e+01 2.578e+01 2.915e+01 8.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 17:14:02,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2024-08-19 17:14:03,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4456290.0, ans=0.125 2024-08-19 17:14:10,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-08-19 17:14:12,621 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1050, loss[loss=0.07042, beats_loss=0.0105, ecapa_loss=0.0001222, whisper_loss=0.0587, over 14316.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01034, ecapa_loss=0.0001387, whisper_loss=0.08816, over 3766949.00 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:14:14,917 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 17:14:16,812 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 17:14:18,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4456390.0, ans=0.125 2024-08-19 17:14:31,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4456490.0, ans=0.2 2024-08-19 17:14:40,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4456490.0, ans=0.125 2024-08-19 17:14:53,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4456590.0, ans=0.125 2024-08-19 17:15:02,612 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-19 17:15:24,019 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-19 17:15:27,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4456790.0, ans=0.125 2024-08-19 17:15:37,030 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1100, loss[loss=0.1151, beats_loss=0.0085, ecapa_loss=0.0001542, whisper_loss=0.1051, over 19397.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01029, ecapa_loss=0.0001392, whisper_loss=0.08861, over 3771384.06 frames. ], batch size: 76, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:15:41,064 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 17:15:43,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4456890.0, ans=0.125 2024-08-19 17:16:10,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4457090.0, ans=0.0 2024-08-19 17:16:27,143 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 17:16:30,635 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 17:16:34,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4457190.0, ans=0.0 2024-08-19 17:16:35,565 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 17:16:45,397 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 17:16:48,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.222e+01 2.508e+01 2.804e+01 3.305e+02, threshold=5.015e+01, percent-clipped=2.0 2024-08-19 17:16:49,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4457290.0, ans=0.2 2024-08-19 17:17:01,929 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1150, loss[loss=0.09109, beats_loss=0.009519, ecapa_loss=0.0001259, whisper_loss=0.08031, over 19971.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.0001387, whisper_loss=0.08909, over 3771890.09 frames. ], batch size: 81, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:17:12,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4457390.0, ans=0.04949747468305833 2024-08-19 17:17:26,944 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 26 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-19 17:17:28,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4457490.0, ans=0.125 2024-08-19 17:17:29,918 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 17:17:38,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4457590.0, ans=0.1 2024-08-19 17:17:40,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4457590.0, ans=0.125 2024-08-19 17:17:41,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4457590.0, ans=0.125 2024-08-19 17:17:56,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4457690.0, ans=0.125 2024-08-19 17:17:56,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4457690.0, ans=0.0 2024-08-19 17:18:26,400 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1200, loss[loss=0.1127, beats_loss=0.01078, ecapa_loss=0.0001347, whisper_loss=0.1006, over 22693.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.08921, over 3789045.85 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:18:27,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4457890.0, ans=0.1 2024-08-19 17:18:32,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4457890.0, ans=0.125 2024-08-19 17:18:33,740 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 17:18:50,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4457990.0, ans=0.125 2024-08-19 17:18:55,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-08-19 17:19:13,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4458090.0, ans=0.125 2024-08-19 17:19:13,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4458090.0, ans=0.1 2024-08-19 17:19:20,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2024-08-19 17:19:23,198 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 17:19:26,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4458190.0, ans=0.2 2024-08-19 17:19:36,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.677e+01 2.243e+01 2.469e+01 2.791e+01 3.736e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-19 17:19:42,058 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 17:19:50,492 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1250, loss[loss=0.1123, beats_loss=0.009908, ecapa_loss=0.0001134, whisper_loss=0.1012, over 20262.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001387, whisper_loss=0.08942, over 3762428.27 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:19:50,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4458390.0, ans=0.0 2024-08-19 17:20:46,960 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 17:21:00,870 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 34 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 17:21:08,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4458790.0, ans=0.0 2024-08-19 17:21:15,040 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1300, loss[loss=0.09088, beats_loss=0.01185, ecapa_loss=0.0001064, whisper_loss=0.07796, over 23577.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0103, ecapa_loss=0.0001388, whisper_loss=0.08947, over 3776053.82 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:21:15,214 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 17:21:25,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-19 17:21:43,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-08-19 17:21:55,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2024-08-19 17:22:00,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4459090.0, ans=0.07 2024-08-19 17:22:04,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4459190.0, ans=0.125 2024-08-19 17:22:21,637 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:22:23,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.156e+01 2.325e+01 2.640e+01 4.207e+01, threshold=4.651e+01, percent-clipped=0.0 2024-08-19 17:22:34,518 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 17:22:37,395 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1350, loss[loss=0.09588, beats_loss=0.008503, ecapa_loss=0.0001821, whisper_loss=0.08556, over 16597.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.000139, whisper_loss=0.089, over 3782063.79 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:22:52,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4459490.0, ans=0.125 2024-08-19 17:23:07,633 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 17:23:14,592 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 17:23:46,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:54,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:57,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:58,958 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 17:23:59,858 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1400, loss[loss=0.1038, beats_loss=0.01015, ecapa_loss=0.0001193, whisper_loss=0.09247, over 16896.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01026, ecapa_loss=0.0001393, whisper_loss=0.08898, over 3751924.85 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:24:01,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-19 17:24:16,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4459990.0, ans=0.1 2024-08-19 17:24:31,457 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 17:25:08,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4460290.0, ans=0.0 2024-08-19 17:25:11,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.246e+01 2.446e+01 2.816e+01 8.915e+01, threshold=4.891e+01, percent-clipped=2.0 2024-08-19 17:25:18,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4460290.0, ans=0.125 2024-08-19 17:25:18,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4460290.0, ans=0.125 2024-08-19 17:25:19,345 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 15 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 17:25:24,146 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1450, loss[loss=0.1223, beats_loss=0.009719, ecapa_loss=0.0001254, whisper_loss=0.1113, over 19921.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01028, ecapa_loss=0.0001384, whisper_loss=0.08897, over 3766114.57 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:25:25,048 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-19 17:25:25,914 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 17:25:57,504 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-19 17:26:03,548 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 17:26:03,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4460590.0, ans=0.125 2024-08-19 17:26:08,473 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 9 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 17:26:46,815 WARNING [optim.py:496] (2/4) Scaling gradients by 0.051883164793252945, model_norm_threshold=48.91460418701172 2024-08-19 17:26:46,978 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.482e+04, grad_sumsq=2.578e+04, orig_rms_sq=3.290e+00 2024-08-19 17:26:53,315 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1500, loss[loss=0.09996, beats_loss=0.01089, ecapa_loss=0.0001285, whisper_loss=0.08779, over 12549.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01043, ecapa_loss=0.0001378, whisper_loss=0.08835, over 3772262.64 frames. ], batch size: 51, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:27:06,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4460890.0, ans=0.1 2024-08-19 17:27:41,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461090.0, ans=0.1 2024-08-19 17:27:57,049 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 17:28:09,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.242e+01 2.515e+01 2.868e+01 9.428e+02, threshold=5.031e+01, percent-clipped=1.0 2024-08-19 17:28:10,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461290.0, ans=0.1 2024-08-19 17:28:17,886 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 36 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 17:28:22,817 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1550, loss[loss=0.0941, beats_loss=0.008436, ecapa_loss=0.0001334, whisper_loss=0.08433, over 19252.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.08893, over 3770911.91 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:28:26,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4461390.0, ans=0.0 2024-08-19 17:28:31,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4461390.0, ans=0.125 2024-08-19 17:28:33,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4461390.0, ans=0.2 2024-08-19 17:28:37,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4461390.0, ans=0.2 2024-08-19 17:28:40,792 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 17:28:41,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4461490.0, ans=0.125 2024-08-19 17:28:50,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4461490.0, ans=0.125 2024-08-19 17:28:51,372 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 17:29:21,504 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 17:29:28,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4461690.0, ans=0.2 2024-08-19 17:29:36,547 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461790.0, ans=0.1 2024-08-19 17:29:43,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4461790.0, ans=0.2 2024-08-19 17:29:48,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4461890.0, ans=0.0 2024-08-19 17:29:50,172 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1600, loss[loss=0.09563, beats_loss=0.01072, ecapa_loss=9.122e-05, whisper_loss=0.08399, over 13956.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001385, whisper_loss=0.08864, over 3739582.25 frames. ], batch size: 52, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:29:57,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4461890.0, ans=0.125 2024-08-19 17:30:02,295 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 17:30:04,313 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-19 17:30:15,629 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 17:30:17,862 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 17:30:39,212 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 17:31:00,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4462290.0, ans=0.125 2024-08-19 17:31:03,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.243e+01 2.444e+01 2.645e+01 4.310e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-19 17:31:07,385 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 17:31:08,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-08-19 17:31:17,909 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1650, loss[loss=0.1137, beats_loss=0.009753, ecapa_loss=0.0001407, whisper_loss=0.1025, over 18978.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01028, ecapa_loss=0.0001391, whisper_loss=0.08888, over 3745267.97 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:31:23,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4462390.0, ans=0.125 2024-08-19 17:31:31,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4462390.0, ans=0.125 2024-08-19 17:31:54,099 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 17:32:05,409 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 17:32:27,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4462790.0, ans=0.2 2024-08-19 17:32:36,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4462790.0, ans=0.0 2024-08-19 17:32:43,016 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1700, loss[loss=0.09663, beats_loss=0.01024, ecapa_loss=0.000133, whisper_loss=0.08506, over 16620.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.08875, over 3772711.51 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:32:49,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4462890.0, ans=0.125 2024-08-19 17:32:49,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4462890.0, ans=0.125 2024-08-19 17:33:12,899 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 32 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 17:33:36,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4463190.0, ans=0.1 2024-08-19 17:33:52,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4463290.0, ans=0.0 2024-08-19 17:33:54,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.265e+01 2.450e+01 2.804e+01 4.783e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-19 17:34:05,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4463290.0, ans=0.1 2024-08-19 17:34:08,174 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1750, loss[loss=0.08953, beats_loss=0.01252, ecapa_loss=0.0001314, whisper_loss=0.07569, over 20651.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0103, ecapa_loss=0.0001392, whisper_loss=0.0892, over 3782470.33 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:34:08,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4463390.0, ans=0.0 2024-08-19 17:34:16,723 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 17:34:19,895 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 17:34:23,000 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 17:34:23,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4463490.0, ans=0.125 2024-08-19 17:34:28,502 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 17:35:08,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.18 vs. limit=10.0 2024-08-19 17:35:18,079 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-19 17:35:31,773 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1800, loss[loss=0.03733, beats_loss=0.01033, ecapa_loss=0.0001833, whisper_loss=0.02517, over 13797.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01036, ecapa_loss=0.0001385, whisper_loss=0.08886, over 3751717.65 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:35:46,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4463990.0, ans=0.05 2024-08-19 17:36:24,455 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 17:36:25,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4464190.0, ans=0.2 2024-08-19 17:36:25,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-08-19 17:36:30,799 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 27 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-19 17:36:40,214 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.242e+01 2.530e+01 2.809e+01 4.955e+01, threshold=5.060e+01, percent-clipped=1.0 2024-08-19 17:36:53,834 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1850, loss[loss=0.105, beats_loss=0.009941, ecapa_loss=0.0001439, whisper_loss=0.09357, over 21707.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001369, whisper_loss=0.08925, over 3770376.27 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:37:01,605 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-19 17:37:04,323 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 17:37:07,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4464390.0, ans=0.1 2024-08-19 17:37:42,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464690.0, ans=0.1 2024-08-19 17:37:51,020 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 19 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-19 17:38:03,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4464790.0, ans=0.125 2024-08-19 17:38:17,774 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1900, loss[loss=0.1034, beats_loss=0.01016, ecapa_loss=0.0001518, whisper_loss=0.09168, over 21067.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01035, ecapa_loss=0.0001374, whisper_loss=0.08888, over 3756248.78 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:38:55,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4465090.0, ans=0.0 2024-08-19 17:38:56,449 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 17 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-19 17:38:57,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4465090.0, ans=0.125 2024-08-19 17:39:04,683 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 17:39:12,368 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 16 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-19 17:39:28,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.273e+01 2.509e+01 2.742e+01 5.984e+01, threshold=5.017e+01, percent-clipped=1.0 2024-08-19 17:39:35,056 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 17:39:41,894 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 1950, loss[loss=0.0835, beats_loss=0.01002, ecapa_loss=0.0001974, whisper_loss=0.07151, over 19557.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.08897, over 3731197.91 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:39:43,716 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-19 17:39:44,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4465390.0, ans=0.125 2024-08-19 17:40:23,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2024-08-19 17:40:28,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2024-08-19 17:40:31,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4465590.0, ans=0.1 2024-08-19 17:40:33,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-19 17:40:34,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4465690.0, ans=0.05 2024-08-19 17:40:37,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4465690.0, ans=0.125 2024-08-19 17:40:42,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4465690.0, ans=0.2 2024-08-19 17:40:43,866 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 17:40:49,666 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2024-08-19 17:40:57,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4465790.0, ans=0.125 2024-08-19 17:41:07,329 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2000, loss[loss=0.08974, beats_loss=0.01144, ecapa_loss=0.0001518, whisper_loss=0.07678, over 22130.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01045, ecapa_loss=0.0001377, whisper_loss=0.08802, over 3743023.62 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:41:19,371 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 17:41:26,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4465990.0, ans=0.1 2024-08-19 17:42:00,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4466190.0, ans=0.125 2024-08-19 17:42:02,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2024-08-19 17:42:18,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.386e+01 2.592e+01 2.882e+01 2.246e+02, threshold=5.185e+01, percent-clipped=4.0 2024-08-19 17:42:19,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4466290.0, ans=0.09899494936611666 2024-08-19 17:42:31,613 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2050, loss[loss=0.09601, beats_loss=0.01063, ecapa_loss=0.0001359, whisper_loss=0.08402, over 22280.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01045, ecapa_loss=0.0001361, whisper_loss=0.08826, over 3742351.07 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:42:31,807 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 19 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 17:42:32,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4466390.0, ans=0.025 2024-08-19 17:42:42,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-19 17:42:45,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4466390.0, ans=0.5 2024-08-19 17:42:49,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4466490.0, ans=0.0 2024-08-19 17:42:52,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4466490.0, ans=0.1 2024-08-19 17:42:59,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4466490.0, ans=0.0 2024-08-19 17:43:08,973 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 33 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 17:43:18,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4466590.0, ans=0.125 2024-08-19 17:43:44,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4466790.0, ans=0.125 2024-08-19 17:43:50,413 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 17:43:58,912 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2100, loss[loss=0.09593, beats_loss=0.0136, ecapa_loss=0.0001123, whisper_loss=0.08121, over 23429.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01046, ecapa_loss=0.0001363, whisper_loss=0.08845, over 3756194.56 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:44:18,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4466990.0, ans=0.2 2024-08-19 17:44:25,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4466990.0, ans=0.125 2024-08-19 17:44:44,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4467090.0, ans=0.0 2024-08-19 17:44:52,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4467190.0, ans=0.125 2024-08-19 17:44:57,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4467190.0, ans=0.07 2024-08-19 17:45:07,534 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 14 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 17:45:10,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.301e+01 2.618e+01 2.880e+01 6.452e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 17:45:14,479 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.721e-03 2024-08-19 17:45:17,587 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 17:45:24,093 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2150, loss[loss=0.1048, beats_loss=0.009472, ecapa_loss=0.0001174, whisper_loss=0.09418, over 14616.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0105, ecapa_loss=0.0001366, whisper_loss=0.08824, over 3753451.27 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:45:43,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4467490.0, ans=0.125 2024-08-19 17:45:49,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4467490.0, ans=0.0 2024-08-19 17:46:04,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4467590.0, ans=0.0 2024-08-19 17:46:22,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4467690.0, ans=0.0 2024-08-19 17:46:39,329 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.390e+00 2024-08-19 17:46:51,888 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2200, loss[loss=0.0983, beats_loss=0.01184, ecapa_loss=0.0001235, whisper_loss=0.08522, over 22984.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01048, ecapa_loss=0.0001367, whisper_loss=0.08842, over 3752737.81 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:46:52,035 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 17:47:35,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4468090.0, ans=0.05 2024-08-19 17:48:01,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4468290.0, ans=0.125 2024-08-19 17:48:04,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.319e+01 2.611e+01 2.846e+01 3.358e+02, threshold=5.223e+01, percent-clipped=1.0 2024-08-19 17:48:09,743 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 8 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 17:48:17,616 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2250, loss[loss=0.1017, beats_loss=0.009079, ecapa_loss=0.00016, whisper_loss=0.09106, over 16344.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001372, whisper_loss=0.08943, over 3746586.06 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:48:26,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4468390.0, ans=0.0 2024-08-19 17:48:43,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4468490.0, ans=0.1 2024-08-19 17:48:56,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-19 17:49:03,728 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 17:49:04,379 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.201e+05 2024-08-19 17:49:04,735 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-19 17:49:26,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4468790.0, ans=0.0 2024-08-19 17:49:42,743 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2300, loss[loss=0.09124, beats_loss=0.01211, ecapa_loss=0.0001358, whisper_loss=0.07777, over 19399.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001381, whisper_loss=0.08964, over 3755767.26 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:49:48,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4468890.0, ans=0.0 2024-08-19 17:49:51,605 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 17:50:00,468 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 17:50:29,144 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 17:50:32,958 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 17:50:54,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.331e+01 2.598e+01 2.979e+01 4.563e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-19 17:51:02,033 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-08-19 17:51:07,668 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2350, loss[loss=0.09348, beats_loss=0.009503, ecapa_loss=0.0001605, whisper_loss=0.08237, over 17148.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001415, whisper_loss=0.09004, over 3774861.26 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:51:13,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4469390.0, ans=0.0 2024-08-19 17:51:27,991 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 17:52:00,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4469690.0, ans=0.0 2024-08-19 17:52:11,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-19 17:52:23,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4469790.0, ans=0.125 2024-08-19 17:52:33,337 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2400, loss[loss=0.09124, beats_loss=0.007766, ecapa_loss=0.0001747, whisper_loss=0.08173, over 14283.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001401, whisper_loss=0.08996, over 3768225.59 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:52:50,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4469990.0, ans=0.125 2024-08-19 17:52:53,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4469990.0, ans=0.125 2024-08-19 17:52:57,244 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 17:53:05,981 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 17:53:11,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4470090.0, ans=0.1 2024-08-19 17:53:15,433 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-19 17:53:19,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4470090.0, ans=0.125 2024-08-19 17:53:35,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-19 17:53:44,454 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 17:53:46,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.352e+01 2.498e+01 2.702e+01 6.582e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-19 17:53:49,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4470290.0, ans=0.2 2024-08-19 17:53:49,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4470290.0, ans=0.07 2024-08-19 17:53:49,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4470290.0, ans=0.0 2024-08-19 17:54:01,060 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2450, loss[loss=0.1191, beats_loss=0.00888, ecapa_loss=0.0001653, whisper_loss=0.1086, over 21048.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001384, whisper_loss=0.08911, over 3765669.02 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:54:07,262 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 17:54:17,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-19 17:54:34,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4470490.0, ans=0.2 2024-08-19 17:54:37,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-19 17:54:46,434 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 17:54:48,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4470590.0, ans=0.0 2024-08-19 17:55:11,774 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 17:55:28,491 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2500, loss[loss=0.1198, beats_loss=0.008138, ecapa_loss=0.0001523, whisper_loss=0.1102, over 23381.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.09026, over 3780648.76 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:55:30,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4470890.0, ans=0.1 2024-08-19 17:56:08,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4471090.0, ans=0.1 2024-08-19 17:56:16,740 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 17:56:17,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4471090.0, ans=0.1 2024-08-19 17:56:19,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4471190.0, ans=0.5 2024-08-19 17:56:28,086 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 17:56:40,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.297e+01 2.536e+01 2.855e+01 4.497e+01, threshold=5.072e+01, percent-clipped=1.0 2024-08-19 17:56:49,481 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 17:56:54,078 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2550, loss[loss=0.09922, beats_loss=0.01149, ecapa_loss=0.0001295, whisper_loss=0.08643, over 23344.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.0899, over 3798605.48 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:56:59,921 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 17:57:03,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4471390.0, ans=0.125 2024-08-19 17:57:24,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4471490.0, ans=0.125 2024-08-19 17:57:32,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2024-08-19 17:57:38,921 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 31 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 17:57:40,356 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 26 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-19 17:57:42,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-19 17:57:56,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4471690.0, ans=0.2 2024-08-19 17:58:03,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4471790.0, ans=0.2 2024-08-19 17:58:11,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-19 17:58:14,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4471790.0, ans=0.125 2024-08-19 17:58:18,182 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 17:58:19,397 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2600, loss[loss=0.1004, beats_loss=0.01286, ecapa_loss=0.0001308, whisper_loss=0.08623, over 22232.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001393, whisper_loss=0.09, over 3803115.62 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:58:37,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4471990.0, ans=0.125 2024-08-19 17:58:47,633 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 32 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-19 17:58:49,733 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 26 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 17:58:55,694 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-19 17:59:08,703 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 17:59:11,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-19 17:59:17,156 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 17:59:19,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4472190.0, ans=0.2 2024-08-19 17:59:33,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.323e+01 2.520e+01 2.771e+01 4.731e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 17:59:35,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4472290.0, ans=0.2 2024-08-19 17:59:42,398 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 17:59:47,285 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2650, loss[loss=0.1077, beats_loss=0.01008, ecapa_loss=0.0001288, whisper_loss=0.0963, over 21825.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.08985, over 3800890.28 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:00:03,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4472390.0, ans=0.125 2024-08-19 18:00:17,137 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 18:00:21,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4472490.0, ans=0.125 2024-08-19 18:00:57,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4472690.0, ans=0.125 2024-08-19 18:01:02,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4472790.0, ans=0.95 2024-08-19 18:01:03,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2024-08-19 18:01:18,032 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2700, loss[loss=0.09374, beats_loss=0.01261, ecapa_loss=0.0001582, whisper_loss=0.07955, over 14594.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.000141, whisper_loss=0.09015, over 3804870.51 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:01:26,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4472890.0, ans=0.125 2024-08-19 18:02:18,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-19 18:02:20,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4473190.0, ans=0.125 2024-08-19 18:02:21,412 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 18:02:22,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-08-19 18:02:24,821 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 12 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 18:02:31,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.406e+01 2.691e+01 2.981e+01 2.904e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-19 18:02:33,786 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 18:02:45,274 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2750, loss[loss=0.09181, beats_loss=0.01095, ecapa_loss=8.864e-05, whisper_loss=0.07998, over 18603.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.08958, over 3796312.86 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:02:55,904 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 18:03:08,822 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4473490.0, ans=0.0 2024-08-19 18:03:09,944 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 18:03:29,344 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 18:03:31,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4473590.0, ans=0.125 2024-08-19 18:03:31,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4473590.0, ans=0.0 2024-08-19 18:03:37,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4473690.0, ans=0.125 2024-08-19 18:03:55,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4473790.0, ans=0.125 2024-08-19 18:03:55,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4473790.0, ans=0.0 2024-08-19 18:04:13,814 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2800, loss[loss=0.09653, beats_loss=0.01079, ecapa_loss=0.0001271, whisper_loss=0.08447, over 14627.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.0891, over 3756172.70 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:04:15,914 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 18:04:28,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4473890.0, ans=0.125 2024-08-19 18:04:38,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-19 18:05:03,866 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 12 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 18:05:28,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.228e+01 2.485e+01 2.852e+01 2.973e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-19 18:05:31,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4474290.0, ans=0.125 2024-08-19 18:05:40,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-19 18:05:43,144 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2850, loss[loss=0.1301, beats_loss=0.009566, ecapa_loss=0.0001436, whisper_loss=0.1191, over 22562.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.08915, over 3764813.40 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:06:07,676 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 18:06:12,377 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-19 18:06:24,325 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.088e-02 2024-08-19 18:06:36,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4474690.0, ans=0.1 2024-08-19 18:06:47,481 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 18:07:01,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4474790.0, ans=0.125 2024-08-19 18:07:11,017 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2900, loss[loss=0.07838, beats_loss=0.01138, ecapa_loss=0.0001628, whisper_loss=0.06537, over 16525.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.0896, over 3778924.15 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:07:15,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4474890.0, ans=0.0 2024-08-19 18:07:47,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4475090.0, ans=0.0 2024-08-19 18:07:49,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4475090.0, ans=0.1 2024-08-19 18:07:53,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2024-08-19 18:07:58,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4475090.0, ans=0.125 2024-08-19 18:08:06,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4475190.0, ans=0.125 2024-08-19 18:08:26,013 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 18:08:27,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+01 2.227e+01 2.443e+01 2.748e+01 5.602e+01, threshold=4.887e+01, percent-clipped=1.0 2024-08-19 18:08:30,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4475290.0, ans=0.0 2024-08-19 18:08:37,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=22.5 2024-08-19 18:08:41,770 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 2950, loss[loss=0.1167, beats_loss=0.009644, ecapa_loss=0.0001728, whisper_loss=0.1053, over 18727.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.0894, over 3759620.76 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:08:56,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-19 18:09:17,032 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 18:09:55,759 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 15 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-19 18:10:12,306 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3000, loss[loss=0.1101, beats_loss=0.01027, ecapa_loss=0.0001735, whisper_loss=0.09815, over 21063.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.08873, over 3789599.73 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:10:12,306 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-19 18:10:48,059 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on ASR_libri: loss=0.2543, beats_loss=0, ecapa_loss=0.0005052, whisper_loss=0.2492, over 931116.00 frames. 2024-08-19 18:10:59,476 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2516, 2.1877, 2.6501, 1.5895], device='cuda:2') 2024-08-19 18:11:09,423 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003946, beats_loss=0, ecapa_loss=0.0003946, whisper_loss=0, over 944235.00 frames. 2024-08-19 18:12:49,071 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on AT_audioset: loss=0.02308, beats_loss=0.02308, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 18:12:49,075 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-19 18:12:49,311 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 24 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-19 18:12:50,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2024-08-19 18:12:57,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4475890.0, ans=0.125 2024-08-19 18:13:15,231 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.056e-01 2024-08-19 18:13:15,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-19 18:13:27,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4476090.0, ans=0.0 2024-08-19 18:13:36,329 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 18:14:02,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.325e+01 2.611e+01 2.866e+01 5.886e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-19 18:14:12,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4476290.0, ans=0.125 2024-08-19 18:14:16,864 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3050, loss[loss=0.08991, beats_loss=0.01021, ecapa_loss=0.0001442, whisper_loss=0.07827, over 20548.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.08879, over 3828564.03 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:14:19,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4476390.0, ans=0.2 2024-08-19 18:14:21,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-19 18:14:38,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4476490.0, ans=0.0 2024-08-19 18:14:52,450 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 18:15:01,645 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-19 18:15:19,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-19 18:15:50,660 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3100, loss[loss=0.1054, beats_loss=0.00944, ecapa_loss=0.0001555, whisper_loss=0.09445, over 19843.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.08863, over 3789835.84 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:16:01,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4476890.0, ans=0.0 2024-08-19 18:16:03,521 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 18:16:17,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4476990.0, ans=0.0 2024-08-19 18:16:27,782 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 27 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 18:16:31,225 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 18:16:50,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4477190.0, ans=0.1 2024-08-19 18:17:07,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.384e+01 2.601e+01 2.875e+01 4.295e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 18:17:09,192 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-19 18:17:19,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4477290.0, ans=0.1 2024-08-19 18:17:22,606 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3150, loss[loss=0.08498, beats_loss=0.01281, ecapa_loss=0.0001017, whisper_loss=0.07116, over 17102.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001428, whisper_loss=0.08932, over 3771306.92 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:17:37,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-08-19 18:17:42,834 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 18:17:43,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4477490.0, ans=0.1 2024-08-19 18:17:48,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4477490.0, ans=0.0 2024-08-19 18:17:57,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4477590.0, ans=0.0 2024-08-19 18:17:58,636 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 18:18:01,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4477590.0, ans=0.0 2024-08-19 18:18:10,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4477590.0, ans=0.1 2024-08-19 18:18:15,283 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 18:18:20,818 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 18:18:25,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4477690.0, ans=0.07 2024-08-19 18:18:29,659 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 18:18:30,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-19 18:18:39,983 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 18:18:52,527 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3200, loss[loss=0.1122, beats_loss=0.01154, ecapa_loss=0.0001192, whisper_loss=0.0995, over 19307.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001427, whisper_loss=0.08871, over 3745336.62 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:18:59,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4477890.0, ans=0.125 2024-08-19 18:19:08,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2024-08-19 18:19:20,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4477990.0, ans=0.125 2024-08-19 18:19:46,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-19 18:19:51,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4478190.0, ans=0.125 2024-08-19 18:19:52,149 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 18:20:07,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.317e+01 2.495e+01 2.835e+01 3.728e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-19 18:20:13,168 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 18:20:17,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4478290.0, ans=0.125 2024-08-19 18:20:21,887 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3250, loss[loss=0.107, beats_loss=0.01042, ecapa_loss=0.0001381, whisper_loss=0.09524, over 18580.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001415, whisper_loss=0.08868, over 3771842.86 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:20:57,091 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 18:21:01,588 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 18:21:29,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4478690.0, ans=0.125 2024-08-19 18:21:38,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 18:21:44,911 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 18:21:49,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4478790.0, ans=0.1 2024-08-19 18:21:54,147 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3300, loss[loss=0.08285, beats_loss=0.01149, ecapa_loss=0.0001353, whisper_loss=0.07, over 22898.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001416, whisper_loss=0.08929, over 3798506.06 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:22:10,216 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 18:22:12,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4478990.0, ans=0.0 2024-08-19 18:22:18,937 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 18:22:35,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4479090.0, ans=0.125 2024-08-19 18:22:38,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4479090.0, ans=0.0 2024-08-19 18:23:07,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.244e+01 2.479e+01 2.933e+01 3.930e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 18:23:08,110 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 34 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 18:23:13,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4479290.0, ans=0.0 2024-08-19 18:23:17,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4479290.0, ans=0.2 2024-08-19 18:23:23,204 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3350, loss[loss=0.07924, beats_loss=0.01472, ecapa_loss=0.0001204, whisper_loss=0.06331, over 17687.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001405, whisper_loss=0.08958, over 3753514.52 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:23:34,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4479390.0, ans=0.125 2024-08-19 18:23:47,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4479490.0, ans=0.125 2024-08-19 18:23:49,109 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-19 18:23:58,282 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 18:24:00,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4479590.0, ans=0.125 2024-08-19 18:24:22,153 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 18:24:24,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4479690.0, ans=0.2 2024-08-19 18:24:42,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4479790.0, ans=0.1 2024-08-19 18:24:52,563 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3400, loss[loss=0.128, beats_loss=0.008479, ecapa_loss=0.0001472, whisper_loss=0.118, over 20380.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001412, whisper_loss=0.08892, over 3735168.91 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:25:08,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4479990.0, ans=0.0 2024-08-19 18:25:24,031 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.677e+00 2024-08-19 18:25:27,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4479990.0, ans=0.1 2024-08-19 18:25:41,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 18:26:08,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4480290.0, ans=0.125 2024-08-19 18:26:08,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-19 18:26:09,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.304e+01 2.534e+01 2.812e+01 7.025e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-19 18:26:21,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4480290.0, ans=0.2 2024-08-19 18:26:24,785 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3450, loss[loss=0.09021, beats_loss=0.01381, ecapa_loss=0.0001108, whisper_loss=0.07529, over 22430.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.08912, over 3768078.99 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:26:45,472 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 22 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-19 18:26:59,139 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 15 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 18:27:15,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4480690.0, ans=0.0 2024-08-19 18:27:41,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4480790.0, ans=0.0 2024-08-19 18:27:50,097 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3500, loss[loss=0.08608, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.07403, over 16904.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.000141, whisper_loss=0.08867, over 3795791.55 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:28:02,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-08-19 18:28:07,168 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 18:28:22,424 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 18:28:26,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481090.0, ans=0.1 2024-08-19 18:28:38,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4481090.0, ans=0.04949747468305833 2024-08-19 18:28:42,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=12.0 2024-08-19 18:28:49,792 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 18:29:01,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.232e+01 2.458e+01 2.911e+01 6.376e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-19 18:29:03,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4481290.0, ans=0.125 2024-08-19 18:29:14,274 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3550, loss[loss=0.1083, beats_loss=0.00799, ecapa_loss=0.0001765, whisper_loss=0.09858, over 14883.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08964, over 3806162.82 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:29:23,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4481390.0, ans=0.09899494936611666 2024-08-19 18:29:28,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4481390.0, ans=0.1 2024-08-19 18:29:38,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4481490.0, ans=0.015 2024-08-19 18:29:47,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4481590.0, ans=0.1 2024-08-19 18:29:47,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4481590.0, ans=0.125 2024-08-19 18:30:08,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4481690.0, ans=0.0 2024-08-19 18:30:11,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4481690.0, ans=0.125 2024-08-19 18:30:25,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481790.0, ans=0.1 2024-08-19 18:30:34,755 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3600, loss[loss=0.1134, beats_loss=0.009378, ecapa_loss=0.0001267, whisper_loss=0.1027, over 19467.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001407, whisper_loss=0.08983, over 3806100.03 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:30:39,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4481890.0, ans=0.2 2024-08-19 18:30:42,874 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 9 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 18:30:45,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=12.0 2024-08-19 18:30:48,106 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 18:30:56,476 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-19 18:31:00,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4481990.0, ans=0.0 2024-08-19 18:31:21,433 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 18:31:22,771 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 19 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 18:31:25,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-08-19 18:31:28,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.89 vs. limit=10.0 2024-08-19 18:31:34,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4482190.0, ans=0.125 2024-08-19 18:31:34,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-19 18:31:39,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4482290.0, ans=0.125 2024-08-19 18:31:42,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.192e+01 2.433e+01 2.584e+01 3.997e+01, threshold=4.865e+01, percent-clipped=0.0 2024-08-19 18:31:54,777 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3650, loss[loss=0.09709, beats_loss=0.008697, ecapa_loss=0.0001484, whisper_loss=0.08691, over 19163.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.08919, over 3791214.84 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:32:16,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4482490.0, ans=0.125 2024-08-19 18:32:23,449 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 18:32:35,061 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 18:32:59,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.63 vs. limit=22.5 2024-08-19 18:33:05,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.26 vs. limit=22.5 2024-08-19 18:33:14,982 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3700, loss[loss=0.1053, beats_loss=0.01229, ecapa_loss=0.0001355, whisper_loss=0.09167, over 23110.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.08968, over 3776260.23 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:33:20,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4482890.0, ans=0.125 2024-08-19 18:33:36,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2024-08-19 18:33:53,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4483090.0, ans=0.125 2024-08-19 18:33:59,446 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 24 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 18:34:03,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4483090.0, ans=0.1 2024-08-19 18:34:06,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 18:34:06,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4483090.0, ans=0.125 2024-08-19 18:34:08,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4483090.0, ans=0.125 2024-08-19 18:34:20,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4483190.0, ans=0.0 2024-08-19 18:34:28,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4483290.0, ans=0.125 2024-08-19 18:34:30,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.276e+01 2.511e+01 2.757e+01 7.975e+01, threshold=5.022e+01, percent-clipped=3.0 2024-08-19 18:34:42,961 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3750, loss[loss=0.1169, beats_loss=0.009432, ecapa_loss=0.0001202, whisper_loss=0.1063, over 19261.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001423, whisper_loss=0.08925, over 3784204.96 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:34:53,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-19 18:35:07,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:07,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:09,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:13,451 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 18:35:28,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4483590.0, ans=0.1 2024-08-19 18:35:40,152 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 23 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-19 18:35:42,716 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 18:35:47,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4483790.0, ans=0.2 2024-08-19 18:35:48,937 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 18:35:52,699 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.245e+01 2024-08-19 18:36:03,416 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3800, loss[loss=0.1124, beats_loss=0.01244, ecapa_loss=0.0001357, whisper_loss=0.09861, over 19403.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001418, whisper_loss=0.08903, over 3773627.24 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:36:14,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4483890.0, ans=0.0 2024-08-19 18:36:18,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4483990.0, ans=0.0 2024-08-19 18:36:19,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4483990.0, ans=0.05 2024-08-19 18:36:30,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4483990.0, ans=0.125 2024-08-19 18:36:54,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4484190.0, ans=0.2 2024-08-19 18:37:03,415 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 11 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 18:37:04,619 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-19 18:37:05,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4484290.0, ans=0.09899494936611666 2024-08-19 18:37:09,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.312e+01 2.559e+01 2.923e+01 4.060e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:37:22,520 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3850, loss[loss=0.09234, beats_loss=0.009519, ecapa_loss=0.0001658, whisper_loss=0.08116, over 19112.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01042, ecapa_loss=0.0001434, whisper_loss=0.08813, over 3744707.44 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:37:45,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4484490.0, ans=0.07 2024-08-19 18:37:46,639 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 18:37:49,705 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 31 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 18:37:51,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4484490.0, ans=0.0 2024-08-19 18:38:19,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4484690.0, ans=0.1 2024-08-19 18:38:28,694 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 16 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 18:38:30,294 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 13 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-19 18:38:40,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3900, loss[loss=0.1125, beats_loss=0.008777, ecapa_loss=0.0001554, whisper_loss=0.1022, over 19709.00 frames. ], tot_loss[loss=0.09965, beats_loss=0.01049, ecapa_loss=0.000143, whisper_loss=0.08773, over 3739843.65 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:38:44,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4484890.0, ans=0.125 2024-08-19 18:39:04,093 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 18:39:04,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-08-19 18:39:08,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4484990.0, ans=15.0 2024-08-19 18:39:22,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4485090.0, ans=0.07 2024-08-19 18:39:47,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.325e+01 2.529e+01 2.804e+01 3.948e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 18:40:00,819 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 3950, loss[loss=0.1141, beats_loss=0.008229, ecapa_loss=0.0001572, whisper_loss=0.1043, over 17463.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.08816, over 3761124.81 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:40:06,919 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 18:40:38,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4485590.0, ans=0.1 2024-08-19 18:40:38,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4485590.0, ans=0.2 2024-08-19 18:40:49,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4485690.0, ans=0.0 2024-08-19 18:41:00,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4485690.0, ans=0.5 2024-08-19 18:41:09,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-19 18:41:14,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-19 18:41:16,362 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 18:41:21,274 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 34 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 18:41:22,596 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4000, loss[loss=0.1316, beats_loss=0.007466, ecapa_loss=0.0001514, whisper_loss=0.1226, over 21766.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001427, whisper_loss=0.08891, over 3767871.58 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:41:30,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4485890.0, ans=0.125 2024-08-19 18:41:42,104 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 18:41:44,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4485990.0, ans=0.0 2024-08-19 18:41:44,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4485990.0, ans=0.2 2024-08-19 18:41:58,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486090.0, ans=0.1 2024-08-19 18:42:29,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.300e+01 2.585e+01 3.012e+01 4.802e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-19 18:42:41,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-19 18:42:42,332 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4050, loss[loss=0.1155, beats_loss=0.00988, ecapa_loss=0.0001083, whisper_loss=0.1046, over 14697.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001432, whisper_loss=0.08864, over 3769056.97 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:42:47,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4486390.0, ans=0.0 2024-08-19 18:43:00,202 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 18 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-19 18:43:02,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486490.0, ans=0.1 2024-08-19 18:43:04,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-08-19 18:43:23,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4486590.0, ans=0.125 2024-08-19 18:43:27,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4486590.0, ans=0.0 2024-08-19 18:43:55,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4486790.0, ans=0.125 2024-08-19 18:44:01,534 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4100, loss[loss=0.1019, beats_loss=0.008819, ecapa_loss=0.0001602, whisper_loss=0.09147, over 18368.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001424, whisper_loss=0.09012, over 3767042.57 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:44:07,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4486890.0, ans=0.125 2024-08-19 18:44:25,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486990.0, ans=0.1 2024-08-19 18:44:30,955 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 18:44:47,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-19 18:44:54,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4487190.0, ans=0.0 2024-08-19 18:44:58,862 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 18:45:07,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.401e+01 2.726e+01 3.123e+01 1.504e+02, threshold=5.451e+01, percent-clipped=2.0 2024-08-19 18:45:11,267 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 18:45:20,295 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4150, loss[loss=0.09584, beats_loss=0.01148, ecapa_loss=0.0001235, whisper_loss=0.08312, over 24022.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001419, whisper_loss=0.09097, over 3823092.61 frames. ], batch size: 96, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:45:22,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4487390.0, ans=0.2 2024-08-19 18:45:28,274 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 18:45:36,937 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4487490.0, ans=0.1 2024-08-19 18:45:45,778 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 18:45:50,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4487590.0, ans=0.1 2024-08-19 18:45:54,896 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 18:45:56,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4487590.0, ans=0.125 2024-08-19 18:46:01,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4487590.0, ans=0.125 2024-08-19 18:46:04,745 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 18:46:31,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4487790.0, ans=0.0 2024-08-19 18:46:40,451 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4200, loss[loss=0.1027, beats_loss=0.009293, ecapa_loss=0.0001381, whisper_loss=0.09198, over 16512.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01026, ecapa_loss=0.0001436, whisper_loss=0.09155, over 3801997.23 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:47:10,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4487990.0, ans=0.125 2024-08-19 18:47:14,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4488090.0, ans=0.1 2024-08-19 18:47:38,828 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 19 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-19 18:47:40,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4488190.0, ans=0.125 2024-08-19 18:47:49,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.234e+01 2.488e+01 2.803e+01 1.323e+02, threshold=4.977e+01, percent-clipped=2.0 2024-08-19 18:47:58,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4488290.0, ans=0.0 2024-08-19 18:47:59,539 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 28 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 18:48:00,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-19 18:48:00,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.26 vs. limit=22.5 2024-08-19 18:48:02,444 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4250, loss[loss=0.08448, beats_loss=0.008987, ecapa_loss=0.0001103, whisper_loss=0.07439, over 14077.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001435, whisper_loss=0.09059, over 3787710.17 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:48:37,594 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-19 18:48:47,762 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 14 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 18:49:22,714 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4300, loss[loss=0.1144, beats_loss=0.008275, ecapa_loss=0.0001876, whisper_loss=0.1043, over 16871.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001437, whisper_loss=0.09045, over 3793743.35 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:50:15,190 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 18:50:28,633 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-08-19 18:50:30,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.303e+01 2.487e+01 2.877e+01 4.114e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-19 18:50:43,619 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4350, loss[loss=0.104, beats_loss=0.01165, ecapa_loss=0.0001339, whisper_loss=0.09102, over 18032.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.09061, over 3811152.91 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:50:44,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-19 18:50:54,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4489390.0, ans=0.0 2024-08-19 18:50:57,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-19 18:51:14,296 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-19 18:51:23,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.02 vs. limit=22.5 2024-08-19 18:51:36,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4489690.0, ans=0.125 2024-08-19 18:51:53,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4489790.0, ans=0.125 2024-08-19 18:52:03,944 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4400, loss[loss=0.1108, beats_loss=0.01046, ecapa_loss=0.0001369, whisper_loss=0.09893, over 22635.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.09034, over 3811795.11 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:52:10,362 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 18:52:20,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4489990.0, ans=0.0 2024-08-19 18:52:37,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4490090.0, ans=0.125 2024-08-19 18:53:11,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.271e+01 2.455e+01 2.760e+01 4.090e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 18:53:15,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4490290.0, ans=0.1 2024-08-19 18:53:23,770 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4450, loss[loss=0.08339, beats_loss=0.01115, ecapa_loss=0.000157, whisper_loss=0.07067, over 20406.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.08939, over 3792382.13 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:53:30,361 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 18:54:20,106 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 31 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 18:54:24,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4490690.0, ans=0.125 2024-08-19 18:54:43,422 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 18:54:44,725 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4500, loss[loss=0.09847, beats_loss=0.01032, ecapa_loss=0.0001559, whisper_loss=0.08659, over 21274.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001425, whisper_loss=0.0897, over 3793351.46 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:54:47,125 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 18:55:09,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2024-08-19 18:55:17,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4491090.0, ans=0.125 2024-08-19 18:55:27,713 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 18:55:37,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4491190.0, ans=0.0 2024-08-19 18:55:38,859 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 18:55:47,156 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-19 18:55:48,745 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 18:55:54,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.263e+01 2.559e+01 2.809e+01 3.466e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:56:07,822 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4550, loss[loss=0.09749, beats_loss=0.01098, ecapa_loss=0.0001147, whisper_loss=0.08536, over 23119.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001426, whisper_loss=0.08922, over 3792047.93 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:56:13,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4491390.0, ans=0.125 2024-08-19 18:56:28,335 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 18:56:30,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4491490.0, ans=0.015 2024-08-19 18:56:39,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2024-08-19 18:56:40,509 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 18:56:42,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-19 18:57:23,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.75 vs. limit=10.0 2024-08-19 18:57:28,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-19 18:57:32,997 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4600, loss[loss=0.0701, beats_loss=0.01468, ecapa_loss=0.0001196, whisper_loss=0.05423, over 16551.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.08976, over 3803979.38 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:57:34,784 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 18:57:39,753 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 15 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 18:57:42,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4491890.0, ans=0.0 2024-08-19 18:57:53,918 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 11 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 18:57:56,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4491990.0, ans=0.04949747468305833 2024-08-19 18:57:59,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4491990.0, ans=0.1 2024-08-19 18:58:11,027 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 38 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 18:58:22,234 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-19 18:58:35,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2024-08-19 18:58:39,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4492290.0, ans=0.0 2024-08-19 18:58:46,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.297e+01 2.492e+01 2.828e+01 4.082e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-19 18:58:53,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4492290.0, ans=0.125 2024-08-19 18:58:57,896 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4650, loss[loss=0.09194, beats_loss=0.01179, ecapa_loss=0.0001575, whisper_loss=0.07858, over 18752.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001424, whisper_loss=0.08965, over 3835475.74 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:59:12,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4492390.0, ans=0.2 2024-08-19 18:59:36,855 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 18:59:37,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4492590.0, ans=0.95 2024-08-19 18:59:58,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-19 18:59:59,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4492690.0, ans=0.125 2024-08-19 19:00:02,154 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 19:00:05,330 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 19:00:16,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4492790.0, ans=0.1 2024-08-19 19:00:22,845 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4700, loss[loss=0.1179, beats_loss=0.0101, ecapa_loss=0.0001237, whisper_loss=0.1066, over 23163.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001422, whisper_loss=0.08916, over 3827679.38 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:00:36,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4492890.0, ans=0.125 2024-08-19 19:00:38,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4492990.0, ans=0.2 2024-08-19 19:00:51,714 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 19:00:56,824 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 19:01:00,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-19 19:01:13,065 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 19:01:26,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4493190.0, ans=0.1 2024-08-19 19:01:28,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4493290.0, ans=0.125 2024-08-19 19:01:34,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.345e+01 2.552e+01 2.786e+01 4.462e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 19:01:45,836 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4750, loss[loss=0.1047, beats_loss=0.01044, ecapa_loss=0.0001073, whisper_loss=0.09321, over 16714.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08905, over 3817114.53 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:01:49,256 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 33 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-19 19:01:59,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4493390.0, ans=0.125 2024-08-19 19:02:02,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4493490.0, ans=0.2 2024-08-19 19:02:04,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4493490.0, ans=0.1 2024-08-19 19:02:09,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4493490.0, ans=0.125 2024-08-19 19:02:11,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2024-08-19 19:02:11,917 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 19:02:13,800 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 40 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 19:02:18,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-19 19:02:21,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493590.0, ans=0.0 2024-08-19 19:02:32,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4493590.0, ans=0.125 2024-08-19 19:02:40,094 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-19 19:03:09,639 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4800, loss[loss=0.1052, beats_loss=0.01041, ecapa_loss=0.0001538, whisper_loss=0.09325, over 20091.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.08899, over 3843432.13 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:03:16,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4493890.0, ans=0.1 2024-08-19 19:03:17,518 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 19:03:20,768 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 19:03:22,392 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 19:03:59,214 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2024-08-19 19:04:01,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4494190.0, ans=0.2 2024-08-19 19:04:04,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4494190.0, ans=0.125 2024-08-19 19:04:16,799 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 19:04:21,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.336e+01 2.600e+01 2.820e+01 4.344e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-19 19:04:27,149 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 19:04:33,252 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4850, loss[loss=0.1025, beats_loss=0.008862, ecapa_loss=0.0001398, whisper_loss=0.09223, over 13627.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.08939, over 3840381.06 frames. ], batch size: 50, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:04:37,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4494390.0, ans=0.125 2024-08-19 19:04:38,299 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 19:04:50,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 19:04:51,754 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 19:04:57,090 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 19:04:59,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2024-08-19 19:05:10,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4494590.0, ans=0.125 2024-08-19 19:05:10,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4494590.0, ans=0.125 2024-08-19 19:05:16,752 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 17 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-19 19:05:32,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4494690.0, ans=0.125 2024-08-19 19:05:33,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4494690.0, ans=0.0 2024-08-19 19:05:37,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4494690.0, ans=0.125 2024-08-19 19:05:52,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-08-19 19:05:56,417 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4900, loss[loss=0.1066, beats_loss=0.009427, ecapa_loss=0.0001271, whisper_loss=0.09589, over 16534.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001412, whisper_loss=0.09024, over 3843555.38 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:06:13,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.04 vs. limit=6.0 2024-08-19 19:06:37,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4495090.0, ans=0.125 2024-08-19 19:06:37,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4495090.0, ans=0.125 2024-08-19 19:06:37,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-19 19:06:40,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4495090.0, ans=0.0 2024-08-19 19:06:54,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4495190.0, ans=0.0 2024-08-19 19:07:10,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.336e+01 2.530e+01 2.860e+01 1.367e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-19 19:07:22,619 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 4950, loss[loss=0.1266, beats_loss=0.009324, ecapa_loss=0.0001504, whisper_loss=0.1157, over 23021.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001417, whisper_loss=0.09083, over 3873658.10 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:07:36,944 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 19:07:43,659 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 19:07:52,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4495490.0, ans=0.125 2024-08-19 19:07:52,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4495490.0, ans=0.125 2024-08-19 19:07:53,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4495490.0, ans=0.125 2024-08-19 19:08:03,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4495590.0, ans=0.0 2024-08-19 19:08:06,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=4495590.0, ans=12.0 2024-08-19 19:08:38,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4495790.0, ans=0.0 2024-08-19 19:08:49,524 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5000, loss[loss=0.1079, beats_loss=0.009376, ecapa_loss=0.0001343, whisper_loss=0.09716, over 22762.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001417, whisper_loss=0.09111, over 3888430.01 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:09:22,277 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4495990.0, ans=0.125 2024-08-19 19:09:43,806 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 19:09:47,352 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 19:10:04,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.356e+01 2.547e+01 2.785e+01 7.027e+01, threshold=5.094e+01, percent-clipped=1.0 2024-08-19 19:10:04,805 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 16 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 19:10:16,955 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5050, loss[loss=0.1397, beats_loss=0.007946, ecapa_loss=0.000135, whisper_loss=0.1304, over 20340.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.000142, whisper_loss=0.09121, over 3848501.70 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:10:25,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4496390.0, ans=0.95 2024-08-19 19:10:26,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4496390.0, ans=0.125 2024-08-19 19:10:34,505 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 19:10:37,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4496490.0, ans=0.0 2024-08-19 19:11:30,387 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 19:11:42,209 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5100, loss[loss=0.101, beats_loss=0.009252, ecapa_loss=0.0001777, whisper_loss=0.08994, over 20720.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001411, whisper_loss=0.09066, over 3809965.84 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:11:42,439 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 25 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 19:11:54,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4496890.0, ans=10.0 2024-08-19 19:12:16,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4497090.0, ans=0.0 2024-08-19 19:12:25,462 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 19:12:29,237 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 19:12:33,609 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 30 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 19:12:45,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4497190.0, ans=0.125 2024-08-19 19:12:49,242 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.845e-02 2024-08-19 19:12:53,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.293e+01 2.514e+01 2.831e+01 4.907e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 19:13:04,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4497390.0, ans=0.2 2024-08-19 19:13:05,687 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5150, loss[loss=0.09368, beats_loss=0.01049, ecapa_loss=0.0001334, whisper_loss=0.08186, over 19642.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001411, whisper_loss=0.0901, over 3797424.37 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:13:24,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4497490.0, ans=0.5 2024-08-19 19:13:32,395 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 19:13:39,972 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 19:13:44,814 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 19:13:53,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4497590.0, ans=0.0 2024-08-19 19:14:06,089 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 19:14:27,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-19 19:14:29,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4497790.0, ans=0.125 2024-08-19 19:14:33,135 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5200, loss[loss=0.1047, beats_loss=0.0114, ecapa_loss=0.0001334, whisper_loss=0.09202, over 22058.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001422, whisper_loss=0.09045, over 3812459.64 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:14:55,523 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 19:15:12,093 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2024-08-19 19:15:22,773 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:15:27,263 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 19:15:29,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4498190.0, ans=0.125 2024-08-19 19:15:33,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4498190.0, ans=0.1 2024-08-19 19:15:44,626 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 19:15:45,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4498290.0, ans=0.125 2024-08-19 19:15:46,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.272e+01 2.588e+01 2.852e+01 4.438e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-19 19:15:57,701 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5250, loss[loss=0.1107, beats_loss=0.01004, ecapa_loss=0.0001324, whisper_loss=0.09938, over 21835.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.0911, over 3802347.90 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:16:14,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4498490.0, ans=0.1 2024-08-19 19:16:31,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2024-08-19 19:16:37,802 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 19:16:38,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4498590.0, ans=0.2 2024-08-19 19:16:42,780 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-19 19:16:47,818 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 19:16:51,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4498690.0, ans=0.125 2024-08-19 19:16:52,805 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 25 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-19 19:16:58,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4498690.0, ans=0.0 2024-08-19 19:17:19,652 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5300, loss[loss=0.09992, beats_loss=0.01224, ecapa_loss=9.614e-05, whisper_loss=0.08672, over 16583.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001418, whisper_loss=0.09055, over 3799985.39 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:17:33,161 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-19 19:17:34,191 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07655883580446243, model_norm_threshold=51.76279067993164 2024-08-19 19:17:34,354 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.480e+05, grad_sumsq=1.406e+07, orig_rms_sq=1.053e-02 2024-08-19 19:18:30,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.288e+01 2.528e+01 2.946e+01 6.761e+02, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 19:18:42,117 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5350, loss[loss=0.08937, beats_loss=0.01102, ecapa_loss=0.0001539, whisper_loss=0.07682, over 21578.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.08959, over 3771525.16 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:18:48,940 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 19:18:57,365 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05797187611460686, model_norm_threshold=50.55705261230469 2024-08-19 19:18:57,531 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.0.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.406e+05, grad_sumsq=1.406e+05, orig_rms_sq=1.000e+00 2024-08-19 19:19:04,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4499490.0, ans=0.2 2024-08-19 19:19:04,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4499490.0, ans=0.1 2024-08-19 19:19:14,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4499490.0, ans=0.2 2024-08-19 19:19:14,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4499490.0, ans=0.125 2024-08-19 19:19:15,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-19 19:19:20,188 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 23 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-19 19:19:20,428 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:19:21,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4499590.0, ans=0.125 2024-08-19 19:19:35,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4499590.0, ans=0.0 2024-08-19 19:19:41,585 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 13 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 19:19:44,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2024-08-19 19:19:52,512 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 19:20:12,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4499890.0, ans=0.125 2024-08-19 19:20:13,374 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5400, loss[loss=0.1054, beats_loss=0.01047, ecapa_loss=0.0001556, whisper_loss=0.09342, over 21748.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.08986, over 3785096.93 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:20:15,832 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:20:36,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4499990.0, ans=0.125 2024-08-19 19:20:36,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4499990.0, ans=0.0 2024-08-19 19:20:38,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4499990.0, ans=0.2 2024-08-19 19:20:40,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=4499990.0, ans=0.1 2024-08-19 19:20:52,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4500090.0, ans=0.125 2024-08-19 19:20:54,920 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 19:21:05,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4500190.0, ans=0.125 2024-08-19 19:21:06,359 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 19:21:16,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-19 19:21:27,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.273e+01 2.608e+01 3.002e+01 8.721e+02, threshold=5.217e+01, percent-clipped=3.0 2024-08-19 19:21:39,178 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5450, loss[loss=0.0902, beats_loss=0.009447, ecapa_loss=0.000171, whisper_loss=0.07905, over 20431.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.09061, over 3794953.10 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:21:40,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4500390.0, ans=0.125 2024-08-19 19:22:00,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4500490.0, ans=0.2 2024-08-19 19:22:01,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4500490.0, ans=0.125 2024-08-19 19:22:02,934 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 19:22:28,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4500590.0, ans=0.1 2024-08-19 19:22:31,349 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 34 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 19:22:38,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4500690.0, ans=0.2 2024-08-19 19:22:45,413 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 19:23:06,169 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 14 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 19:23:08,980 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5500, loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.0001304, whisper_loss=0.09244, over 21790.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.09053, over 3825021.24 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:23:15,116 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 19:23:37,138 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 19:23:47,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4501090.0, ans=0.125 2024-08-19 19:23:58,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4501090.0, ans=0.2 2024-08-19 19:23:59,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4501190.0, ans=0.0 2024-08-19 19:24:08,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4501190.0, ans=0.1 2024-08-19 19:24:25,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.236e+01 2.438e+01 2.713e+01 9.093e+01, threshold=4.875e+01, percent-clipped=1.0 2024-08-19 19:24:39,707 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5550, loss[loss=0.0818, beats_loss=0.01236, ecapa_loss=0.0001318, whisper_loss=0.06812, over 22610.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.09008, over 3803329.75 frames. ], batch size: 95, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:24:46,290 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 19:24:57,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-19 19:25:20,750 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 19 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-19 19:25:42,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4501690.0, ans=0.125 2024-08-19 19:25:52,515 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-19 19:25:58,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4501790.0, ans=0.125 2024-08-19 19:26:12,777 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-19 19:26:15,401 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5600, loss[loss=0.09826, beats_loss=0.009773, ecapa_loss=0.0001524, whisper_loss=0.08696, over 19098.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.08936, over 3794005.69 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:26:47,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4501990.0, ans=0.0 2024-08-19 19:26:54,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4502090.0, ans=0.1 2024-08-19 19:27:08,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4502090.0, ans=0.0 2024-08-19 19:27:11,996 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 19:27:14,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4502190.0, ans=0.0 2024-08-19 19:27:18,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4502190.0, ans=0.2 2024-08-19 19:27:20,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4502190.0, ans=0.0 2024-08-19 19:27:26,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=15.0 2024-08-19 19:27:29,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4502190.0, ans=0.1 2024-08-19 19:27:35,048 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 19:27:37,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.290e+01 2.503e+01 2.698e+01 5.557e+01, threshold=5.007e+01, percent-clipped=1.0 2024-08-19 19:27:48,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-08-19 19:27:51,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4502390.0, ans=0.0 2024-08-19 19:27:51,977 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5650, loss[loss=0.08994, beats_loss=0.01295, ecapa_loss=0.0001177, whisper_loss=0.07582, over 15489.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.09013, over 3836211.68 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:27:55,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-19 19:27:58,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4502390.0, ans=0.125 2024-08-19 19:28:04,422 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-19 19:28:31,711 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 19:28:35,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4502590.0, ans=0.125 2024-08-19 19:28:44,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4502590.0, ans=0.0 2024-08-19 19:28:51,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4502690.0, ans=0.0 2024-08-19 19:29:27,595 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5700, loss[loss=0.1033, beats_loss=0.01099, ecapa_loss=0.0001481, whisper_loss=0.09082, over 19026.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001404, whisper_loss=0.08959, over 3849751.21 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:29:28,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4502890.0, ans=0.0 2024-08-19 19:29:47,899 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 19:29:48,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-19 19:29:59,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4502990.0, ans=0.125 2024-08-19 19:30:13,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4503090.0, ans=0.125 2024-08-19 19:30:15,919 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 19:30:45,927 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 19:30:51,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.276e+01 2.546e+01 2.979e+01 5.244e+01, threshold=5.092e+01, percent-clipped=1.0 2024-08-19 19:31:04,811 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5750, loss[loss=0.09088, beats_loss=0.01231, ecapa_loss=0.00013, whisper_loss=0.07727, over 20940.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001391, whisper_loss=0.08853, over 3834325.92 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:31:09,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4503390.0, ans=0.125 2024-08-19 19:31:37,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4503490.0, ans=0.125 2024-08-19 19:31:38,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-08-19 19:31:50,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4503590.0, ans=0.95 2024-08-19 19:31:52,385 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4503590.0, ans=0.125 2024-08-19 19:31:57,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2024-08-19 19:32:35,819 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5800, loss[loss=0.09095, beats_loss=0.01245, ecapa_loss=0.0001304, whisper_loss=0.07719, over 22365.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01063, ecapa_loss=0.0001404, whisper_loss=0.08816, over 3823200.79 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:32:46,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-08-19 19:32:58,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4503990.0, ans=0.0 2024-08-19 19:33:01,314 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 19:33:01,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4503990.0, ans=0.2 2024-08-19 19:33:19,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4504090.0, ans=0.125 2024-08-19 19:33:28,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4504090.0, ans=0.0 2024-08-19 19:33:36,398 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 19:33:50,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4504190.0, ans=0.125 2024-08-19 19:33:51,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4504290.0, ans=0.125 2024-08-19 19:33:58,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.339e+01 2.561e+01 2.956e+01 4.463e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-19 19:34:11,252 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5850, loss[loss=0.1006, beats_loss=0.01671, ecapa_loss=0.0001467, whisper_loss=0.08244, over 18990.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01067, ecapa_loss=0.0001396, whisper_loss=0.08842, over 3815707.85 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:34:27,750 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.447e-03 2024-08-19 19:34:40,154 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-08-19 19:34:50,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4504590.0, ans=0.125 2024-08-19 19:35:04,651 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 19:35:23,417 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 19:35:31,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4504790.0, ans=0.07 2024-08-19 19:35:43,486 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 30 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 19:35:44,444 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5900, loss[loss=0.1146, beats_loss=0.008895, ecapa_loss=0.0001664, whisper_loss=0.104, over 20524.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001409, whisper_loss=0.08877, over 3798054.42 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:36:01,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4504890.0, ans=0.04949747468305833 2024-08-19 19:36:06,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4504990.0, ans=0.125 2024-08-19 19:36:27,877 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 19:36:29,776 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 19:36:38,022 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 19:36:38,224 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:36:55,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4505190.0, ans=0.2 2024-08-19 19:36:55,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4505190.0, ans=0.125 2024-08-19 19:37:03,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4505290.0, ans=0.125 2024-08-19 19:37:09,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.259e+01 2.428e+01 2.766e+01 1.765e+02, threshold=4.857e+01, percent-clipped=1.0 2024-08-19 19:37:23,691 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 5950, loss[loss=0.1021, beats_loss=0.01238, ecapa_loss=0.0001513, whisper_loss=0.08817, over 22451.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.08938, over 3817652.27 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:37:31,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4505390.0, ans=0.04949747468305833 2024-08-19 19:37:35,120 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 19:38:00,585 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 19:38:14,504 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 19:38:20,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4505690.0, ans=0.1 2024-08-19 19:38:24,842 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-19 19:38:50,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4505790.0, ans=0.2 2024-08-19 19:38:58,744 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6000, loss[loss=0.09635, beats_loss=0.01059, ecapa_loss=0.0001422, whisper_loss=0.08434, over 21164.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001406, whisper_loss=0.08955, over 3810865.84 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:38:58,744 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-19 19:39:35,497 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005172, whisper_loss=0.2488, over 931116.00 frames. 2024-08-19 19:39:57,491 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-19 19:40:53,340 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3960, 2.6653, 2.9889, 2.8070], device='cuda:2') 2024-08-19 19:41:38,259 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 19:41:38,262 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-19 19:41:46,652 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 12 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 19:42:16,492 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 22 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-19 19:42:20,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4506090.0, ans=0.05 2024-08-19 19:42:22,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4506090.0, ans=0.125 2024-08-19 19:42:55,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.381e+01 2.694e+01 2.980e+01 4.120e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-19 19:43:07,815 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6050, loss[loss=0.1074, beats_loss=0.008791, ecapa_loss=0.0001333, whisper_loss=0.09732, over 21351.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001401, whisper_loss=0.08926, over 3808641.66 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:43:33,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-19 19:43:45,717 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 19:43:45,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4506590.0, ans=0.125 2024-08-19 19:44:10,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-19 19:44:13,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4506690.0, ans=0.125 2024-08-19 19:44:18,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4506790.0, ans=0.2 2024-08-19 19:44:27,327 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 19:44:37,711 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6100, loss[loss=0.1096, beats_loss=0.01058, ecapa_loss=0.0001681, whisper_loss=0.0973, over 20807.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01057, ecapa_loss=0.0001404, whisper_loss=0.08913, over 3827568.09 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:45:08,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4506990.0, ans=0.05 2024-08-19 19:45:08,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2024-08-19 19:45:19,523 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 19:45:28,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4507190.0, ans=0.125 2024-08-19 19:45:33,901 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.67 vs. limit=10.0 2024-08-19 19:45:42,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4507190.0, ans=0.125 2024-08-19 19:45:52,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.249e+01 2.614e+01 2.889e+01 5.523e+01, threshold=5.228e+01, percent-clipped=1.0 2024-08-19 19:45:58,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:46:00,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4507290.0, ans=0.1 2024-08-19 19:46:07,433 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6150, loss[loss=0.1079, beats_loss=0.01071, ecapa_loss=0.0001119, whisper_loss=0.0961, over 23298.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001401, whisper_loss=0.0892, over 3817838.26 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:46:18,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4507390.0, ans=0.1 2024-08-19 19:46:25,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=22.5 2024-08-19 19:46:28,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4507490.0, ans=0.5 2024-08-19 19:46:29,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4507490.0, ans=0.0 2024-08-19 19:46:30,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-19 19:46:39,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-08-19 19:47:35,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4507790.0, ans=0.0 2024-08-19 19:47:38,072 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6200, loss[loss=0.08415, beats_loss=0.01227, ecapa_loss=0.0001221, whisper_loss=0.07066, over 18421.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01066, ecapa_loss=0.0001398, whisper_loss=0.08863, over 3805951.12 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:48:05,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4507990.0, ans=0.125 2024-08-19 19:48:18,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4508090.0, ans=0.125 2024-08-19 19:48:20,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4508090.0, ans=0.1 2024-08-19 19:48:22,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4508090.0, ans=0.125 2024-08-19 19:48:26,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4508090.0, ans=0.0 2024-08-19 19:48:34,514 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-19 19:48:34,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4508190.0, ans=0.125 2024-08-19 19:48:59,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.361e+01 2.656e+01 2.980e+01 4.502e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-19 19:49:14,603 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6250, loss[loss=0.1184, beats_loss=0.009592, ecapa_loss=0.0001289, whisper_loss=0.1076, over 17125.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01071, ecapa_loss=0.0001394, whisper_loss=0.0883, over 3809418.65 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:49:32,705 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.841e-01 2024-08-19 19:49:38,784 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 19:50:02,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4508590.0, ans=0.0 2024-08-19 19:50:25,604 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 19:50:29,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4508690.0, ans=0.5 2024-08-19 19:50:48,574 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 19:50:52,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=22.5 2024-08-19 19:50:55,203 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6300, loss[loss=0.1081, beats_loss=0.01161, ecapa_loss=0.0001262, whisper_loss=0.09519, over 22355.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001399, whisper_loss=0.08893, over 3829724.65 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:51:03,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4508890.0, ans=0.09899494936611666 2024-08-19 19:51:12,650 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-19 19:51:16,931 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 19:51:20,543 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 19:51:42,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4509090.0, ans=0.125 2024-08-19 19:51:47,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4509090.0, ans=0.1 2024-08-19 19:52:09,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4509190.0, ans=0.125 2024-08-19 19:52:20,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.295e+01 2.454e+01 2.761e+01 3.903e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-19 19:52:31,839 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-19 19:52:34,340 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6350, loss[loss=0.09397, beats_loss=0.01021, ecapa_loss=0.0001543, whisper_loss=0.08221, over 18618.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01061, ecapa_loss=0.0001407, whisper_loss=0.0892, over 3841729.65 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:52:41,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-19 19:52:44,674 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 33 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 19:52:52,515 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 15 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 19:52:58,911 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 19:53:05,584 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 19:53:50,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4509690.0, ans=0.125 2024-08-19 19:54:13,966 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6400, loss[loss=0.1035, beats_loss=0.01008, ecapa_loss=0.0001695, whisper_loss=0.09174, over 17348.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001401, whisper_loss=0.09005, over 3862064.83 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:54:32,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4509890.0, ans=0.125 2024-08-19 19:54:48,410 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 19:54:48,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-19 19:55:04,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4510090.0, ans=10.0 2024-08-19 19:55:10,563 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=12.0 2024-08-19 19:55:13,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4510190.0, ans=0.1 2024-08-19 19:55:17,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.57 vs. limit=22.5 2024-08-19 19:55:26,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-08-19 19:55:28,383 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 29 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 19:55:31,628 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-19 19:55:35,370 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 19:55:37,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4510290.0, ans=0.125 2024-08-19 19:55:38,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.373e+01 2.627e+01 3.161e+01 1.061e+02, threshold=5.254e+01, percent-clipped=1.0 2024-08-19 19:55:41,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4510290.0, ans=0.2 2024-08-19 19:55:51,587 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6450, loss[loss=0.1036, beats_loss=0.01033, ecapa_loss=0.0001503, whisper_loss=0.09172, over 16581.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001421, whisper_loss=0.09037, over 3859142.70 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:56:05,421 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 25 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-19 19:56:07,160 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 19:56:50,413 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-19 19:56:55,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4510690.0, ans=0.0 2024-08-19 19:57:27,122 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 19:57:28,134 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6500, loss[loss=0.1077, beats_loss=0.009755, ecapa_loss=0.0001653, whisper_loss=0.09633, over 18952.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001419, whisper_loss=0.09004, over 3861204.97 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:57:38,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-08-19 19:57:39,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4510890.0, ans=0.125 2024-08-19 19:58:08,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4511090.0, ans=0.125 2024-08-19 19:58:41,039 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 19:58:43,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.329e+01 2.544e+01 2.954e+01 4.370e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-19 19:58:55,528 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6550, loss[loss=0.08067, beats_loss=0.01349, ecapa_loss=0.0001155, whisper_loss=0.06603, over 18601.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09059, over 3850647.34 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:59:04,021 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 19:59:07,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=4511390.0, ans=12.0 2024-08-19 19:59:15,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4511490.0, ans=0.125 2024-08-19 19:59:21,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4511490.0, ans=0.125 2024-08-19 19:59:28,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4511590.0, ans=0.125 2024-08-19 19:59:32,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-08-19 19:59:47,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-08-19 19:59:49,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4511690.0, ans=0.125 2024-08-19 20:00:07,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-19 20:00:08,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4511790.0, ans=0.125 2024-08-19 20:00:17,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4511790.0, ans=0.125 2024-08-19 20:00:21,665 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6600, loss[loss=0.1006, beats_loss=0.009024, ecapa_loss=0.0001461, whisper_loss=0.09014, over 19950.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001429, whisper_loss=0.09, over 3850495.42 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:00:49,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4511990.0, ans=0.0 2024-08-19 20:01:34,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.409e+01 2.632e+01 2.888e+01 4.355e+02, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 20:01:36,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4512290.0, ans=0.1 2024-08-19 20:01:45,089 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6650, loss[loss=0.0989, beats_loss=0.01084, ecapa_loss=0.0001246, whisper_loss=0.08681, over 23163.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001429, whisper_loss=0.08964, over 3832253.37 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:01:52,865 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 20:01:55,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4512390.0, ans=0.1 2024-08-19 20:02:02,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-19 20:02:14,801 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 20:02:17,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4512590.0, ans=0.125 2024-08-19 20:02:18,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4512590.0, ans=0.0 2024-08-19 20:02:24,520 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 20:02:45,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-19 20:02:46,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4512690.0, ans=0.125 2024-08-19 20:02:56,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4512790.0, ans=0.2 2024-08-19 20:03:06,964 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6700, loss[loss=0.1015, beats_loss=0.01182, ecapa_loss=0.0001371, whisper_loss=0.08836, over 21179.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001435, whisper_loss=0.08993, over 3823265.68 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:03:07,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4512890.0, ans=0.0 2024-08-19 20:03:18,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4512890.0, ans=0.125 2024-08-19 20:03:24,087 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 20:03:45,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-19 20:04:00,324 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 20:04:05,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4513190.0, ans=0.2 2024-08-19 20:04:12,764 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 20:04:14,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4513290.0, ans=0.07 2024-08-19 20:04:20,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.359e+01 2.752e+01 3.006e+01 5.924e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-19 20:04:23,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-19 20:04:31,006 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 20:04:31,943 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6750, loss[loss=0.08451, beats_loss=0.01177, ecapa_loss=0.0001182, whisper_loss=0.07156, over 17145.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001437, whisper_loss=0.09075, over 3859945.60 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:04:36,396 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 20:04:39,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4513390.0, ans=0.1 2024-08-19 20:04:58,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4513490.0, ans=0.1 2024-08-19 20:05:06,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4513590.0, ans=0.2 2024-08-19 20:05:18,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-08-19 20:05:41,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4513790.0, ans=0.125 2024-08-19 20:05:49,780 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 20:05:56,268 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6800, loss[loss=0.1034, beats_loss=0.01019, ecapa_loss=0.0001359, whisper_loss=0.09187, over 20372.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001443, whisper_loss=0.09094, over 3851805.18 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:05:59,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4513890.0, ans=0.1 2024-08-19 20:06:02,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4513890.0, ans=0.0 2024-08-19 20:06:15,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4513990.0, ans=0.1 2024-08-19 20:06:32,318 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 20:06:35,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4514090.0, ans=0.0 2024-08-19 20:06:37,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-19 20:06:45,032 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 20:06:51,476 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 20:07:03,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4514290.0, ans=0.1 2024-08-19 20:07:08,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.317e+01 2.530e+01 2.822e+01 4.267e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-19 20:07:11,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4514290.0, ans=0.125 2024-08-19 20:07:17,726 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6850, loss[loss=0.1024, beats_loss=0.009972, ecapa_loss=0.0001373, whisper_loss=0.09106, over 16448.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001442, whisper_loss=0.09059, over 3837239.64 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:07:21,890 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 20:07:24,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-08-19 20:07:25,999 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=12.0 2024-08-19 20:07:31,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4514390.0, ans=0.2 2024-08-19 20:07:56,919 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 21 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-19 20:08:06,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4514690.0, ans=0.0 2024-08-19 20:08:11,892 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 20:08:15,935 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 25 from LS+wenet, 8 from Vox, 20 fro AS 2024-08-19 20:08:25,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=15.0 2024-08-19 20:08:26,641 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 20:08:40,596 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6900, loss[loss=0.1086, beats_loss=0.009362, ecapa_loss=0.0001718, whisper_loss=0.09749, over 17910.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001429, whisper_loss=0.0903, over 3824048.11 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:08:41,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4514890.0, ans=0.2 2024-08-19 20:09:07,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4514990.0, ans=0.0 2024-08-19 20:09:37,788 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 20:09:44,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4515290.0, ans=0.125 2024-08-19 20:09:49,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.290e+01 2.485e+01 2.784e+01 7.248e+01, threshold=4.970e+01, percent-clipped=1.0 2024-08-19 20:09:59,383 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 6950, loss[loss=0.1073, beats_loss=0.01095, ecapa_loss=0.0001559, whisper_loss=0.09483, over 14339.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.000143, whisper_loss=0.09067, over 3808979.80 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:10:22,314 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 20:10:25,324 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 20:10:37,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.59 vs. limit=10.0 2024-08-19 20:10:39,582 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-19 20:10:54,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4515690.0, ans=0.0 2024-08-19 20:11:00,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2024-08-19 20:11:02,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4515790.0, ans=0.125 2024-08-19 20:11:05,448 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 20:11:16,651 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 20:11:19,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7000, loss[loss=0.1154, beats_loss=0.0106, ecapa_loss=0.0001256, whisper_loss=0.1035, over 17040.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.09041, over 3806752.86 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:11:23,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4515890.0, ans=0.125 2024-08-19 20:11:28,825 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 20:12:07,894 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-19 20:12:16,776 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 20:12:18,249 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 20:12:32,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.294e+01 2.487e+01 2.816e+01 5.941e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-19 20:12:40,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4516390.0, ans=0.2 2024-08-19 20:12:41,755 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7050, loss[loss=0.1152, beats_loss=0.01183, ecapa_loss=0.0001356, whisper_loss=0.102, over 23547.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001431, whisper_loss=0.09002, over 3811807.15 frames. ], batch size: 97, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:13:01,891 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 20:13:02,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4516490.0, ans=0.125 2024-08-19 20:13:05,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4516490.0, ans=0.125 2024-08-19 20:13:15,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-19 20:13:34,074 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 18 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-19 20:13:43,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4516690.0, ans=0.125 2024-08-19 20:13:53,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4516790.0, ans=0.0 2024-08-19 20:13:56,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4516790.0, ans=0.0 2024-08-19 20:13:58,461 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 20:14:07,909 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 20:14:08,904 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7100, loss[loss=0.08258, beats_loss=0.01176, ecapa_loss=0.000151, whisper_loss=0.06931, over 17809.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001428, whisper_loss=0.09041, over 3807586.05 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:14:25,967 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 20:14:30,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4516990.0, ans=0.2 2024-08-19 20:14:37,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4516990.0, ans=0.125 2024-08-19 20:14:52,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4517090.0, ans=0.2 2024-08-19 20:14:54,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517090.0, ans=0.1 2024-08-19 20:15:02,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4517190.0, ans=0.04949747468305833 2024-08-19 20:15:05,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517190.0, ans=0.1 2024-08-19 20:15:09,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4517190.0, ans=0.0 2024-08-19 20:15:18,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4517290.0, ans=0.125 2024-08-19 20:15:21,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.242e+01 2.446e+01 2.720e+01 3.661e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-19 20:15:22,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4517290.0, ans=0.2 2024-08-19 20:15:31,631 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7150, loss[loss=0.1142, beats_loss=0.008203, ecapa_loss=0.0001725, whisper_loss=0.1043, over 16090.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08943, over 3786005.57 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:15:32,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4517390.0, ans=0.0 2024-08-19 20:15:44,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4517390.0, ans=0.125 2024-08-19 20:15:52,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517490.0, ans=0.1 2024-08-19 20:15:58,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-08-19 20:16:22,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4517690.0, ans=0.125 2024-08-19 20:16:25,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517690.0, ans=0.1 2024-08-19 20:16:32,769 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 20:16:34,358 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 20:16:45,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4517790.0, ans=0.125 2024-08-19 20:16:55,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7200, loss[loss=0.1085, beats_loss=0.008088, ecapa_loss=0.0001777, whisper_loss=0.0986, over 21926.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01058, ecapa_loss=0.0001429, whisper_loss=0.08852, over 3794149.49 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:17:22,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4517990.0, ans=0.0 2024-08-19 20:17:31,415 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 20:18:08,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.204e+01 2.383e+01 2.664e+01 1.113e+02, threshold=4.766e+01, percent-clipped=1.0 2024-08-19 20:18:15,138 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 20:18:17,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4518390.0, ans=0.1 2024-08-19 20:18:18,050 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7250, loss[loss=0.09043, beats_loss=0.008789, ecapa_loss=0.0001209, whisper_loss=0.08043, over 17222.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001432, whisper_loss=0.08926, over 3771337.36 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:18:29,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4518390.0, ans=0.125 2024-08-19 20:18:33,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4518490.0, ans=0.04949747468305833 2024-08-19 20:18:35,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4518490.0, ans=0.1 2024-08-19 20:18:44,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4518490.0, ans=0.0 2024-08-19 20:18:59,362 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 20:19:18,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4518690.0, ans=0.125 2024-08-19 20:19:37,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4518790.0, ans=0.025 2024-08-19 20:19:37,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4518790.0, ans=0.0 2024-08-19 20:19:39,722 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7300, loss[loss=0.11, beats_loss=0.008285, ecapa_loss=0.0001118, whisper_loss=0.1006, over 18874.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001433, whisper_loss=0.08909, over 3777427.66 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:19:49,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4518890.0, ans=0.0 2024-08-19 20:19:50,003 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:19:55,858 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 20:19:55,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4518990.0, ans=0.125 2024-08-19 20:20:10,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4519090.0, ans=0.0 2024-08-19 20:20:19,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4519090.0, ans=0.125 2024-08-19 20:20:21,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=22.5 2024-08-19 20:20:23,264 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09269597381353378, model_norm_threshold=47.66118240356445 2024-08-19 20:20:23,426 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.810e+04, grad_sumsq=3.810e+04, orig_rms_sq=1.000e+00 2024-08-19 20:20:26,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4519090.0, ans=0.125 2024-08-19 20:20:27,610 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 17 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 20:20:36,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4519190.0, ans=0.1 2024-08-19 20:20:49,830 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2024-08-19 20:20:53,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.360e+01 2.664e+01 3.085e+01 5.142e+02, threshold=5.329e+01, percent-clipped=3.0 2024-08-19 20:21:04,144 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7350, loss[loss=0.1069, beats_loss=0.0108, ecapa_loss=0.0001411, whisper_loss=0.09472, over 22698.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.08928, over 3809540.24 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:21:06,858 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 20:21:14,110 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 20:21:14,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.44 vs. limit=22.5 2024-08-19 20:21:38,060 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 20:22:18,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4519790.0, ans=0.5 2024-08-19 20:22:28,808 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 20:22:35,260 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7400, loss[loss=0.0879, beats_loss=0.008595, ecapa_loss=0.0001291, whisper_loss=0.07802, over 17622.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001441, whisper_loss=0.08937, over 3806238.04 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:22:36,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4519890.0, ans=0.0 2024-08-19 20:22:55,724 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 20:23:08,655 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-19 20:23:09,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-19 20:23:18,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4520090.0, ans=0.1 2024-08-19 20:23:21,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:28,358 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 13 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 20:23:48,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4520290.0, ans=0.0 2024-08-19 20:23:53,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2024-08-19 20:23:53,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.289e+01 2.494e+01 2.789e+01 3.959e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-19 20:23:58,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4520290.0, ans=22.5 2024-08-19 20:24:00,283 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 20:24:04,771 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7450, loss[loss=0.07495, beats_loss=0.01107, ecapa_loss=0.000177, whisper_loss=0.06211, over 18848.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001435, whisper_loss=0.08912, over 3781325.39 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:24:28,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4520490.0, ans=0.125 2024-08-19 20:24:39,168 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 20:24:42,558 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.009e-01 2024-08-19 20:24:48,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4520590.0, ans=0.125 2024-08-19 20:24:50,716 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 20:25:02,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2024-08-19 20:25:04,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-19 20:25:35,134 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7500, loss[loss=0.09363, beats_loss=0.01172, ecapa_loss=0.0001706, whisper_loss=0.08021, over 18081.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001441, whisper_loss=0.08957, over 3786370.96 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:25:36,082 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 20:25:58,192 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 20:26:06,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4520990.0, ans=10.0 2024-08-19 20:26:07,338 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 20:26:16,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4521090.0, ans=0.125 2024-08-19 20:26:24,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4521090.0, ans=0.0 2024-08-19 20:26:50,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-19 20:26:52,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4521290.0, ans=0.0 2024-08-19 20:26:56,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.301e+01 2.566e+01 2.939e+01 6.434e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 20:26:57,161 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 20:27:06,799 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7550, loss[loss=0.08504, beats_loss=0.01057, ecapa_loss=0.0001467, whisper_loss=0.07299, over 21624.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001441, whisper_loss=0.08922, over 3779161.78 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:27:07,750 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 20:27:21,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4521390.0, ans=0.0 2024-08-19 20:27:35,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=15.0 2024-08-19 20:27:39,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4521490.0, ans=0.0 2024-08-19 20:27:47,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4521590.0, ans=0.2 2024-08-19 20:27:49,195 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 28 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 20:28:02,911 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 20:28:12,074 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.181e-02 2024-08-19 20:28:40,256 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7600, loss[loss=0.09118, beats_loss=0.01035, ecapa_loss=0.0001682, whisper_loss=0.07915, over 15744.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.000144, whisper_loss=0.08965, over 3795576.16 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:28:41,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4521890.0, ans=0.0 2024-08-19 20:28:48,532 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-19 20:28:58,911 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 20:29:13,781 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 20:29:28,704 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 20:30:04,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.242e+01 2.534e+01 2.867e+01 1.676e+03, threshold=5.067e+01, percent-clipped=0.0 2024-08-19 20:30:04,597 WARNING [optim.py:496] (2/4) Scaling gradients by 0.030242323875427246, model_norm_threshold=50.67152404785156 2024-08-19 20:30:04,761 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.977e+05, grad_sumsq=7.564e+07, orig_rms_sq=1.055e-02 2024-08-19 20:30:07,154 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 26 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 20:30:15,788 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7650, loss[loss=0.07747, beats_loss=0.01539, ecapa_loss=0.0001139, whisper_loss=0.06095, over 22148.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001435, whisper_loss=0.08943, over 3823502.23 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:30:21,389 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-08-19 20:30:35,244 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.181e-02 2024-08-19 20:30:37,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4522490.0, ans=0.1 2024-08-19 20:30:38,908 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.191e+05 2024-08-19 20:30:45,131 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-19 20:30:55,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-08-19 20:31:02,296 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 20:31:07,067 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 20:31:17,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4522690.0, ans=0.125 2024-08-19 20:31:26,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4522690.0, ans=0.125 2024-08-19 20:31:50,080 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7700, loss[loss=0.1071, beats_loss=0.009683, ecapa_loss=0.0001301, whisper_loss=0.09609, over 20040.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001434, whisper_loss=0.08931, over 3819054.00 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:32:18,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4522990.0, ans=0.0 2024-08-19 20:32:55,171 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 18 from LS+wenet, 38 from Vox, 32 fro AS 2024-08-19 20:33:02,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4523290.0, ans=0.0 2024-08-19 20:33:10,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.323e+01 2.517e+01 2.796e+01 4.474e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-19 20:33:19,431 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7750, loss[loss=0.09102, beats_loss=0.01276, ecapa_loss=0.0001182, whisper_loss=0.07708, over 22552.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.08952, over 3839755.44 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:33:31,180 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.235e-01 2024-08-19 20:34:31,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.05 vs. limit=6.0 2024-08-19 20:34:50,102 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7800, loss[loss=0.08673, beats_loss=0.01079, ecapa_loss=0.000141, whisper_loss=0.07453, over 16867.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001419, whisper_loss=0.09002, over 3836728.48 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:34:59,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4523890.0, ans=0.0 2024-08-19 20:35:13,418 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 18 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-19 20:35:29,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-08-19 20:35:32,651 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 20:35:36,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:36,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:58,815 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 31 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 20:36:00,599 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 20:36:09,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.228e+01 2.464e+01 2.830e+01 4.593e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-19 20:36:10,052 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 20:36:10,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4524290.0, ans=0.1 2024-08-19 20:36:18,138 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7850, loss[loss=0.1262, beats_loss=0.009068, ecapa_loss=0.0001578, whisper_loss=0.1156, over 20393.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001414, whisper_loss=0.08976, over 3840394.22 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:36:29,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4524390.0, ans=0.0 2024-08-19 20:36:33,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4524390.0, ans=0.0 2024-08-19 20:36:52,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4524590.0, ans=0.125 2024-08-19 20:36:53,846 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 20:37:04,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4524590.0, ans=0.0 2024-08-19 20:37:05,935 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 20:37:40,414 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 20:37:42,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-19 20:37:46,789 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7900, loss[loss=0.1234, beats_loss=0.008632, ecapa_loss=0.0001433, whisper_loss=0.1133, over 23632.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.09035, over 3824437.75 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:37:55,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4524890.0, ans=0.125 2024-08-19 20:38:38,454 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 20:38:43,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-19 20:38:48,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-08-19 20:38:57,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4525290.0, ans=0.1 2024-08-19 20:39:06,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.308e+01 2.634e+01 2.974e+01 4.173e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-19 20:39:12,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4525290.0, ans=0.1 2024-08-19 20:39:15,483 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 7950, loss[loss=0.1134, beats_loss=0.006602, ecapa_loss=0.0001317, whisper_loss=0.1055, over 16814.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.0899, over 3810855.79 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:39:16,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-19 20:39:20,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-19 20:39:32,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4525490.0, ans=0.1 2024-08-19 20:39:38,728 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 20:39:42,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4525490.0, ans=0.1 2024-08-19 20:39:44,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4525490.0, ans=0.5 2024-08-19 20:39:54,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4525590.0, ans=0.0 2024-08-19 20:40:03,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4525590.0, ans=0.2 2024-08-19 20:40:08,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4525690.0, ans=0.2 2024-08-19 20:40:17,185 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 20:40:28,932 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 20:40:35,838 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 20:40:41,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4525890.0, ans=0.125 2024-08-19 20:40:42,436 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8000, loss[loss=0.1168, beats_loss=0.007708, ecapa_loss=0.000194, whisper_loss=0.1071, over 19469.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.08969, over 3789160.88 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:41:21,328 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-08-19 20:41:28,433 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 20:41:34,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 20:41:42,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4526190.0, ans=0.125 2024-08-19 20:41:50,417 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 20:41:58,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4526290.0, ans=0.125 2024-08-19 20:42:05,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.388e+01 2.587e+01 2.894e+01 4.259e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 20:42:15,377 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8050, loss[loss=0.09453, beats_loss=0.01156, ecapa_loss=9.938e-05, whisper_loss=0.08198, over 18817.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001417, whisper_loss=0.08989, over 3804196.24 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:42:39,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4526490.0, ans=0.125 2024-08-19 20:42:39,798 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-19 20:42:45,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4526490.0, ans=0.125 2024-08-19 20:42:49,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4526490.0, ans=0.125 2024-08-19 20:43:33,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4526790.0, ans=0.125 2024-08-19 20:43:44,721 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 20:43:49,408 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8100, loss[loss=0.0994, beats_loss=0.008263, ecapa_loss=0.0001287, whisper_loss=0.08985, over 14748.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.08967, over 3783933.08 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:44:06,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4526890.0, ans=0.125 2024-08-19 20:44:13,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4526990.0, ans=0.1 2024-08-19 20:44:21,761 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 20:44:23,247 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 20:44:25,705 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 20:44:32,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-19 20:44:37,735 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 20:44:48,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4527190.0, ans=0.2 2024-08-19 20:45:02,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4527190.0, ans=0.2 2024-08-19 20:45:09,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4527290.0, ans=0.125 2024-08-19 20:45:20,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.340e+01 2.531e+01 2.954e+01 4.685e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 20:45:21,699 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 20:45:30,749 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8150, loss[loss=0.1128, beats_loss=0.009727, ecapa_loss=0.0001391, whisper_loss=0.1016, over 19306.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09, over 3782299.38 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:46:02,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4527490.0, ans=0.0 2024-08-19 20:46:03,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4527490.0, ans=0.05 2024-08-19 20:46:05,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4527490.0, ans=0.1 2024-08-19 20:46:10,366 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 20:46:30,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4527690.0, ans=0.2 2024-08-19 20:46:32,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4527690.0, ans=0.0 2024-08-19 20:46:36,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4527690.0, ans=0.5 2024-08-19 20:46:44,935 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 20:47:07,990 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8200, loss[loss=0.1137, beats_loss=0.01052, ecapa_loss=0.0001282, whisper_loss=0.1019, over 22673.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.09036, over 3799654.04 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:47:20,696 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 20:48:11,541 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:48:12,934 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 20:48:18,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4528190.0, ans=0.125 2024-08-19 20:48:34,388 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 20:48:35,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.270e+01 2.400e+01 2.604e+01 4.192e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-19 20:48:44,752 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8250, loss[loss=0.07925, beats_loss=0.01327, ecapa_loss=0.0001978, whisper_loss=0.064, over 18248.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001393, whisper_loss=0.09052, over 3807266.11 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:48:55,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4528390.0, ans=0.125 2024-08-19 20:49:05,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4528490.0, ans=0.125 2024-08-19 20:49:32,242 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 22 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 20:49:35,550 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 20:49:48,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4528690.0, ans=0.125 2024-08-19 20:49:49,704 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 20:49:59,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4528690.0, ans=0.0 2024-08-19 20:50:05,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4528790.0, ans=0.035 2024-08-19 20:50:05,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-19 20:50:20,491 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8300, loss[loss=0.09884, beats_loss=0.008363, ecapa_loss=0.0001395, whisper_loss=0.08908, over 18621.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.09065, over 3815854.96 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:50:46,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4528990.0, ans=0.0 2024-08-19 20:51:07,101 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 20:51:11,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4529090.0, ans=0.125 2024-08-19 20:51:12,862 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 20:51:19,801 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 20:51:21,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4529190.0, ans=0.1 2024-08-19 20:51:23,861 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 20:51:27,397 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 20:51:41,774 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 20:51:42,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.312e+01 2.525e+01 2.734e+01 6.042e+01, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 20:51:51,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=22.5 2024-08-19 20:51:51,662 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8350, loss[loss=0.09106, beats_loss=0.01197, ecapa_loss=0.0001364, whisper_loss=0.07773, over 18571.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.09004, over 3826118.36 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:52:02,247 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-19 20:52:15,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4529490.0, ans=0.1 2024-08-19 20:52:20,033 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 33 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 20:52:23,775 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 20:52:54,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4529690.0, ans=0.0 2024-08-19 20:53:05,776 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 20:53:29,282 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8400, loss[loss=0.09345, beats_loss=0.01057, ecapa_loss=0.00013, whisper_loss=0.08159, over 23155.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.09023, over 3871293.78 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:53:36,998 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 12 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 20:53:38,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4529890.0, ans=0.125 2024-08-19 20:53:45,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4529990.0, ans=0.125 2024-08-19 20:53:53,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4529990.0, ans=0.125 2024-08-19 20:54:16,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4530090.0, ans=0.0 2024-08-19 20:54:27,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4530190.0, ans=0.0 2024-08-19 20:54:40,473 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 20:54:48,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.271e+01 2.577e+01 2.838e+01 4.178e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-19 20:54:59,410 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8450, loss[loss=0.1091, beats_loss=0.008883, ecapa_loss=0.0001243, whisper_loss=0.09895, over 18848.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.08942, over 3845524.32 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:55:14,220 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 20:55:14,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4530390.0, ans=0.05 2024-08-19 20:55:16,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4530390.0, ans=0.125 2024-08-19 20:55:28,316 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 20:55:30,134 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 20:55:35,967 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 20:55:55,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=22.5 2024-08-19 20:56:01,626 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 22 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 20:56:03,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4530690.0, ans=0.0 2024-08-19 20:56:07,597 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 20:56:34,639 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 20:56:40,128 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8500, loss[loss=0.1041, beats_loss=0.00887, ecapa_loss=0.0001621, whisper_loss=0.09357, over 21892.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001403, whisper_loss=0.0895, over 3873365.46 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:57:12,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4530990.0, ans=0.125 2024-08-19 20:57:17,180 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-19 20:58:08,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4531290.0, ans=10.0 2024-08-19 20:58:13,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.342e+01 2.609e+01 2.850e+01 2.704e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-19 20:58:17,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.98 vs. limit=22.5 2024-08-19 20:58:23,656 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8550, loss[loss=0.1124, beats_loss=0.007892, ecapa_loss=0.0001646, whisper_loss=0.1028, over 17070.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0103, ecapa_loss=0.0001418, whisper_loss=0.09021, over 3874540.88 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:58:25,016 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-08-19 20:58:59,469 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 20:59:18,296 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 20:59:25,735 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 20:59:25,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4531690.0, ans=0.05 2024-08-19 20:59:39,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4531790.0, ans=0.09899494936611666 2024-08-19 20:59:59,609 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8600, loss[loss=0.0986, beats_loss=0.01014, ecapa_loss=0.0001606, whisper_loss=0.08686, over 21926.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01027, ecapa_loss=0.000142, whisper_loss=0.09039, over 3849860.59 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:00:20,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4531990.0, ans=0.125 2024-08-19 21:00:27,565 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 21:00:35,276 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 21:01:04,181 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 27 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 21:01:14,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-19 21:01:22,334 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2024-08-19 21:01:30,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.352e+01 2.546e+01 2.927e+01 4.092e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-19 21:01:39,165 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8650, loss[loss=0.1016, beats_loss=0.008624, ecapa_loss=0.0001473, whisper_loss=0.09153, over 22620.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001423, whisper_loss=0.08971, over 3875616.48 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:01:40,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4532390.0, ans=0.125 2024-08-19 21:02:22,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4532590.0, ans=0.05 2024-08-19 21:02:55,624 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 21:03:13,409 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 29 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 21:03:14,436 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8700, loss[loss=0.1242, beats_loss=0.007095, ecapa_loss=0.0001723, whisper_loss=0.1154, over 19772.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001429, whisper_loss=0.09008, over 3877636.10 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:03:25,701 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-19 21:03:25,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4532890.0, ans=0.2 2024-08-19 21:03:36,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4532990.0, ans=0.125 2024-08-19 21:03:43,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4532990.0, ans=0.1 2024-08-19 21:04:34,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.263e+01 2.457e+01 2.766e+01 3.922e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-19 21:04:37,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4533290.0, ans=0.0 2024-08-19 21:04:43,641 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8750, loss[loss=0.1046, beats_loss=0.009009, ecapa_loss=0.0001204, whisper_loss=0.0944, over 13883.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0102, ecapa_loss=0.000143, whisper_loss=0.09083, over 3863223.42 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:04:53,140 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 21:04:56,367 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 21:05:08,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-19 21:05:41,909 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-19 21:05:55,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4533690.0, ans=0.0 2024-08-19 21:05:55,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4533690.0, ans=0.0 2024-08-19 21:06:16,980 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8800, loss[loss=0.101, beats_loss=0.009993, ecapa_loss=0.0001431, whisper_loss=0.08962, over 17506.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01023, ecapa_loss=0.0001421, whisper_loss=0.09075, over 3846940.94 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:06:17,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4533890.0, ans=0.0 2024-08-19 21:06:31,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4533890.0, ans=0.05 2024-08-19 21:06:46,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-19 21:07:00,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4534090.0, ans=0.125 2024-08-19 21:07:02,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 21:07:26,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4534290.0, ans=0.0 2024-08-19 21:07:33,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.239e+01 2.468e+01 2.713e+01 3.674e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-19 21:07:34,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4534290.0, ans=0.1 2024-08-19 21:07:42,246 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8850, loss[loss=0.1071, beats_loss=0.01098, ecapa_loss=0.0001197, whisper_loss=0.09495, over 21687.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001408, whisper_loss=0.0904, over 3846556.79 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:07:48,301 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-08-19 21:07:49,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4534390.0, ans=0.125 2024-08-19 21:07:56,249 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 21:08:10,597 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05975715443491936, model_norm_threshold=49.35716247558594 2024-08-19 21:08:10,761 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.671e+04, grad_sumsq=1.079e+05, orig_rms_sq=6.184e-01 2024-08-19 21:08:18,883 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 24 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-19 21:08:27,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-19 21:08:30,908 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2024-08-19 21:08:34,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4534690.0, ans=0.125 2024-08-19 21:08:38,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4534690.0, ans=0.125 2024-08-19 21:09:03,741 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8900, loss[loss=0.09965, beats_loss=0.009827, ecapa_loss=0.0001557, whisper_loss=0.08827, over 14178.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09011, over 3855894.96 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:09:03,942 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 21:09:07,219 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-19 21:09:13,612 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 21:09:19,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2024-08-19 21:09:27,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4534990.0, ans=0.125 2024-08-19 21:09:31,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4534990.0, ans=0.125 2024-08-19 21:09:54,837 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 21:09:56,162 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 21:10:14,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.373e+01 2.651e+01 2.938e+01 8.260e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 21:10:20,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4535290.0, ans=0.125 2024-08-19 21:10:22,802 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 8950, loss[loss=0.1103, beats_loss=0.01026, ecapa_loss=0.0001714, whisper_loss=0.09837, over 17309.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.09057, over 3863391.09 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:10:29,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2024-08-19 21:10:45,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4535490.0, ans=0.125 2024-08-19 21:11:03,614 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 21:11:03,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4535590.0, ans=0.125 2024-08-19 21:11:11,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4535590.0, ans=0.125 2024-08-19 21:11:14,203 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 21:11:26,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4535690.0, ans=0.125 2024-08-19 21:11:31,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4535790.0, ans=0.0 2024-08-19 21:11:31,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4535790.0, ans=0.125 2024-08-19 21:11:39,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4535790.0, ans=0.125 2024-08-19 21:11:50,877 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9000, loss[loss=0.1113, beats_loss=0.01075, ecapa_loss=0.0001484, whisper_loss=0.09903, over 22936.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.09044, over 3862822.76 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:11:50,877 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-19 21:12:26,876 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005115, whisper_loss=0.248, over 931116.00 frames. 2024-08-19 21:12:49,644 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003978, beats_loss=0, ecapa_loss=0.0003978, whisper_loss=0, over 944235.00 frames. 2024-08-19 21:14:27,932 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 21:14:27,935 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-19 21:14:28,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4535890.0, ans=0.125 2024-08-19 21:14:37,681 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 21:14:40,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4535890.0, ans=0.125 2024-08-19 21:14:57,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-19 21:15:01,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4535990.0, ans=0.2 2024-08-19 21:15:21,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4536090.0, ans=0.125 2024-08-19 21:16:01,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.332e+01 2.634e+01 2.859e+01 8.780e+01, threshold=5.268e+01, percent-clipped=1.0 2024-08-19 21:16:05,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4536290.0, ans=0.125 2024-08-19 21:16:11,962 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9050, loss[loss=0.1118, beats_loss=0.00923, ecapa_loss=0.0001309, whisper_loss=0.1013, over 17018.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001387, whisper_loss=0.09001, over 3856941.37 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:16:26,593 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:16:26,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4536390.0, ans=0.125 2024-08-19 21:16:32,104 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-19 21:16:52,225 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07672171294689178, model_norm_threshold=52.675148010253906 2024-08-19 21:16:52,388 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.217e+05, grad_sumsq=1.217e+05, orig_rms_sq=1.000e+00 2024-08-19 21:16:55,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4536590.0, ans=0.0 2024-08-19 21:17:15,655 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 33 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 21:17:23,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4536690.0, ans=0.125 2024-08-19 21:17:25,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4536690.0, ans=0.125 2024-08-19 21:17:30,907 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 21:17:52,594 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9100, loss[loss=0.1133, beats_loss=0.01075, ecapa_loss=0.0001322, whisper_loss=0.1012, over 22425.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.000139, whisper_loss=0.08996, over 3856926.15 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:17:55,197 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 21:17:55,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=12.0 2024-08-19 21:18:19,213 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 21:18:32,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4536990.0, ans=0.05 2024-08-19 21:18:38,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4537090.0, ans=0.0 2024-08-19 21:18:40,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4537090.0, ans=0.2 2024-08-19 21:18:41,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4537090.0, ans=0.125 2024-08-19 21:18:43,309 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 21:18:45,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4537090.0, ans=0.125 2024-08-19 21:18:49,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2024-08-19 21:18:56,433 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-19 21:19:19,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4537290.0, ans=0.125 2024-08-19 21:19:24,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.187e+01 2.511e+01 2.857e+01 6.866e+02, threshold=5.022e+01, percent-clipped=2.0 2024-08-19 21:19:34,240 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9150, loss[loss=0.09547, beats_loss=0.01149, ecapa_loss=0.0001149, whisper_loss=0.08283, over 21041.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001389, whisper_loss=0.09037, over 3844971.65 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:20:00,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4537490.0, ans=0.0 2024-08-19 21:20:30,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4537690.0, ans=0.125 2024-08-19 21:20:34,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4537690.0, ans=0.0 2024-08-19 21:20:55,017 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 21:21:09,826 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9200, loss[loss=0.08616, beats_loss=0.01054, ecapa_loss=0.0001287, whisper_loss=0.07434, over 14668.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.09037, over 3819594.30 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:21:14,798 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 21:21:16,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4537890.0, ans=0.2 2024-08-19 21:21:25,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-19 21:21:26,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4537890.0, ans=0.2 2024-08-19 21:21:38,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-08-19 21:21:41,734 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 16 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 21:21:51,098 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 21:22:00,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-19 21:22:02,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4538090.0, ans=0.125 2024-08-19 21:22:12,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4538190.0, ans=0.2 2024-08-19 21:22:22,394 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.853e+00 2024-08-19 21:22:29,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4538290.0, ans=0.125 2024-08-19 21:22:34,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-19 21:22:39,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.332e+01 2.620e+01 2.969e+01 6.711e+01, threshold=5.240e+01, percent-clipped=2.0 2024-08-19 21:22:49,872 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9250, loss[loss=0.1098, beats_loss=0.01139, ecapa_loss=0.0001261, whisper_loss=0.09712, over 14025.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.09055, over 3794701.28 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:22:56,966 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 21:23:02,200 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 21:23:17,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4538490.0, ans=0.125 2024-08-19 21:23:36,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4538590.0, ans=0.1 2024-08-19 21:23:57,545 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 29 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 21:24:13,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4538790.0, ans=0.2 2024-08-19 21:24:25,337 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9300, loss[loss=0.08905, beats_loss=0.01191, ecapa_loss=0.0001203, whisper_loss=0.07594, over 22030.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.09055, over 3821917.08 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:24:31,771 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 21:24:50,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4538990.0, ans=0.0 2024-08-19 21:24:50,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4538990.0, ans=0.0 2024-08-19 21:25:02,505 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 21:25:50,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.320e+01 2.550e+01 2.886e+01 6.142e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-19 21:26:00,071 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9350, loss[loss=0.09859, beats_loss=0.01178, ecapa_loss=0.0001364, whisper_loss=0.08545, over 23019.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001407, whisper_loss=0.09034, over 3830977.13 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:26:09,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.15 vs. limit=6.0 2024-08-19 21:26:30,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4539490.0, ans=0.125 2024-08-19 21:26:33,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-19 21:27:00,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4539690.0, ans=0.2 2024-08-19 21:27:14,585 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-19 21:27:16,250 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 21:27:19,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4539790.0, ans=0.0 2024-08-19 21:27:32,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4539890.0, ans=0.125 2024-08-19 21:27:33,478 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9400, loss[loss=0.1252, beats_loss=0.00967, ecapa_loss=0.0001328, whisper_loss=0.1142, over 23663.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.09029, over 3817976.04 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:27:38,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4539890.0, ans=0.125 2024-08-19 21:27:42,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4539890.0, ans=0.07 2024-08-19 21:27:42,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=12.0 2024-08-19 21:27:44,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4539890.0, ans=0.0 2024-08-19 21:28:14,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4540090.0, ans=0.125 2024-08-19 21:28:44,677 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 13 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-19 21:28:44,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4540190.0, ans=0.0 2024-08-19 21:28:57,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.227e+01 2.494e+01 2.747e+01 6.860e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-19 21:29:00,415 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 21:29:00,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540290.0, ans=0.1 2024-08-19 21:29:04,101 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 21:29:07,224 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9450, loss[loss=0.1205, beats_loss=0.01008, ecapa_loss=0.0001318, whisper_loss=0.1091, over 21638.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.09043, over 3832681.82 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:29:10,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4540390.0, ans=0.2 2024-08-19 21:29:23,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4540390.0, ans=0.125 2024-08-19 21:29:46,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-19 21:29:55,595 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 17 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 21:30:11,940 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 21:30:22,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4540690.0, ans=0.125 2024-08-19 21:30:41,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540790.0, ans=0.1 2024-08-19 21:30:43,131 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 21:30:47,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540890.0, ans=0.1 2024-08-19 21:30:48,107 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9500, loss[loss=0.1043, beats_loss=0.01006, ecapa_loss=0.000163, whisper_loss=0.09263, over 15734.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09107, over 3845278.02 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:31:03,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4540890.0, ans=0.0 2024-08-19 21:31:05,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4540890.0, ans=0.125 2024-08-19 21:31:06,646 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 37 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 21:31:11,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4540990.0, ans=0.0 2024-08-19 21:31:46,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4541190.0, ans=0.125 2024-08-19 21:31:57,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4541190.0, ans=0.125 2024-08-19 21:32:17,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.220e+01 2.482e+01 2.741e+01 4.040e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 21:32:20,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2024-08-19 21:32:22,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4541290.0, ans=0.025 2024-08-19 21:32:27,361 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9550, loss[loss=0.09645, beats_loss=0.01061, ecapa_loss=0.0001409, whisper_loss=0.08443, over 23111.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001402, whisper_loss=0.09019, over 3844034.94 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:32:47,903 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 21 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 21:33:03,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4541490.0, ans=0.125 2024-08-19 21:33:21,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4541590.0, ans=0.2 2024-08-19 21:33:36,146 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=12.0 2024-08-19 21:33:41,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4541690.0, ans=0.0 2024-08-19 21:33:45,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4541790.0, ans=0.2 2024-08-19 21:33:46,931 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 21:34:02,969 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9600, loss[loss=0.1015, beats_loss=0.01234, ecapa_loss=0.0001309, whisper_loss=0.08784, over 21866.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001401, whisper_loss=0.08928, over 3809929.34 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:34:04,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4541890.0, ans=0.125 2024-08-19 21:34:25,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4541990.0, ans=0.125 2024-08-19 21:34:25,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4541990.0, ans=0.2 2024-08-19 21:34:38,117 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 23 from LS+wenet, 9 from Vox, 19 fro AS 2024-08-19 21:35:09,630 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-19 21:35:11,880 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 21:35:15,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4542190.0, ans=0.0 2024-08-19 21:35:25,469 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 21:35:34,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.509e+01 2.746e+01 4.719e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-19 21:35:45,882 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9650, loss[loss=0.09585, beats_loss=0.01069, ecapa_loss=0.0001566, whisper_loss=0.08359, over 16150.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001401, whisper_loss=0.08959, over 3805042.26 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:35:50,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4542390.0, ans=0.1 2024-08-19 21:36:08,098 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 21:36:10,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4542490.0, ans=0.2 2024-08-19 21:36:22,601 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-19 21:36:49,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-08-19 21:37:26,852 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:37:29,850 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9700, loss[loss=0.1095, beats_loss=0.01084, ecapa_loss=0.000129, whisper_loss=0.09741, over 22209.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01058, ecapa_loss=0.00014, whisper_loss=0.08899, over 3847167.94 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:38:09,777 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-19 21:38:10,039 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.228e-01 2024-08-19 21:38:12,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4543090.0, ans=0.0 2024-08-19 21:38:14,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4543090.0, ans=0.125 2024-08-19 21:38:24,321 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 11 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 21:38:45,369 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 37 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 21:39:02,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.331e+01 2.511e+01 2.867e+01 4.334e+02, threshold=5.023e+01, percent-clipped=2.0 2024-08-19 21:39:09,510 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 21:39:11,112 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 21:39:12,104 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9750, loss[loss=0.08513, beats_loss=0.01454, ecapa_loss=0.0001275, whisper_loss=0.06931, over 20763.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08925, over 3849389.76 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:39:50,646 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-19 21:39:52,483 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:39:55,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4543590.0, ans=0.0 2024-08-19 21:40:08,139 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 21:40:08,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4543690.0, ans=0.125 2024-08-19 21:40:22,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-19 21:40:32,236 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2024-08-19 21:40:42,267 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 21:40:46,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4543790.0, ans=0.1 2024-08-19 21:40:48,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=4543890.0, ans=8.0 2024-08-19 21:40:48,534 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9800, loss[loss=0.1144, beats_loss=0.01153, ecapa_loss=0.0001291, whisper_loss=0.1015, over 20250.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001407, whisper_loss=0.08958, over 3837588.55 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:40:59,540 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 21:41:28,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4544090.0, ans=0.125 2024-08-19 21:41:51,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4544190.0, ans=0.125 2024-08-19 21:42:08,446 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 21:42:10,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4544290.0, ans=0.2 2024-08-19 21:42:16,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.232e+01 2.541e+01 2.711e+01 1.410e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 21:42:19,036 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 21:42:21,483 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-19 21:42:26,626 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9850, loss[loss=0.0947, beats_loss=0.01155, ecapa_loss=0.0001312, whisper_loss=0.08184, over 22070.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.08965, over 3806754.37 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:42:30,619 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 21:42:36,789 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2024-08-19 21:42:47,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4544490.0, ans=0.0 2024-08-19 21:42:59,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4544490.0, ans=0.125 2024-08-19 21:43:09,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-19 21:43:23,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4544690.0, ans=0.0 2024-08-19 21:43:37,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4544690.0, ans=0.0 2024-08-19 21:43:40,448 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 21:43:57,255 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9900, loss[loss=0.09754, beats_loss=0.009803, ecapa_loss=0.0001575, whisper_loss=0.08616, over 13592.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001403, whisper_loss=0.08946, over 3785901.17 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:44:21,758 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 10 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 21:44:23,173 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 17 from LS+wenet, 36 from Vox, 38 fro AS 2024-08-19 21:44:23,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4544990.0, ans=0.2 2024-08-19 21:44:54,537 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 21:45:00,026 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 21:45:09,774 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.603e+05 2024-08-19 21:45:17,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.278e+01 2.511e+01 2.738e+01 3.827e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 21:45:25,786 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 9950, loss[loss=0.1005, beats_loss=0.009586, ecapa_loss=0.0001696, whisper_loss=0.08918, over 19444.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001402, whisper_loss=0.08931, over 3788795.54 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:45:26,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4545390.0, ans=0.95 2024-08-19 21:46:13,437 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 14 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 21:46:22,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4545690.0, ans=0.1 2024-08-19 21:46:45,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4545790.0, ans=0.1 2024-08-19 21:46:56,078 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10000, loss[loss=0.09821, beats_loss=0.01143, ecapa_loss=0.0001314, whisper_loss=0.08546, over 22172.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001396, whisper_loss=0.08908, over 3753101.47 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:47:00,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4545890.0, ans=0.125 2024-08-19 21:47:09,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4545890.0, ans=0.125 2024-08-19 21:47:09,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2024-08-19 21:47:15,989 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 21:47:29,191 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 31 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 21:47:44,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2024-08-19 21:48:08,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-19 21:48:17,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.220e+01 2.388e+01 2.607e+01 4.207e+01, threshold=4.776e+01, percent-clipped=0.0 2024-08-19 21:48:22,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4546290.0, ans=0.125 2024-08-19 21:48:26,803 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10050, loss[loss=0.117, beats_loss=0.009048, ecapa_loss=0.0001433, whisper_loss=0.1066, over 22480.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001398, whisper_loss=0.08942, over 3759541.41 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:48:31,165 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 18 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-19 21:48:46,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4546490.0, ans=0.125 2024-08-19 21:48:46,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4546490.0, ans=0.07 2024-08-19 21:49:20,897 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 21:49:27,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4546690.0, ans=0.125 2024-08-19 21:49:42,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4546790.0, ans=0.0 2024-08-19 21:49:57,065 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10100, loss[loss=0.09401, beats_loss=0.01047, ecapa_loss=0.0001509, whisper_loss=0.08203, over 21721.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001408, whisper_loss=0.08929, over 3786433.57 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:50:03,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4546890.0, ans=0.09899494936611666 2024-08-19 21:50:13,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4546890.0, ans=0.1 2024-08-19 21:50:18,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4546990.0, ans=0.125 2024-08-19 21:50:24,200 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-19 21:50:27,837 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 21:50:41,287 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 21:50:49,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4547090.0, ans=0.1 2024-08-19 21:51:06,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4547190.0, ans=0.125 2024-08-19 21:51:15,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4547290.0, ans=0.1 2024-08-19 21:51:19,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4547290.0, ans=0.0 2024-08-19 21:51:21,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.291e+01 2.540e+01 2.770e+01 3.592e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-19 21:51:30,717 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10150, loss[loss=0.1143, beats_loss=0.0112, ecapa_loss=9.956e-05, whisper_loss=0.1021, over 17536.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.08875, over 3773337.85 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:51:44,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4547390.0, ans=0.125 2024-08-19 21:52:18,173 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 27 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 21:52:33,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4547690.0, ans=0.125 2024-08-19 21:52:58,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-19 21:53:02,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4547790.0, ans=0.125 2024-08-19 21:53:04,951 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10200, loss[loss=0.1047, beats_loss=0.01131, ecapa_loss=0.0001241, whisper_loss=0.09212, over 23115.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001409, whisper_loss=0.08979, over 3777132.47 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:53:20,095 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 21:53:20,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4547890.0, ans=0.0 2024-08-19 21:54:16,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4548190.0, ans=0.2 2024-08-19 21:54:33,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.328e+01 2.553e+01 2.832e+01 3.795e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-19 21:54:36,272 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-19 21:54:43,480 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10250, loss[loss=0.0941, beats_loss=0.01165, ecapa_loss=0.0001312, whisper_loss=0.08114, over 20598.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001405, whisper_loss=0.08876, over 3813338.22 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:54:44,419 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 21:54:46,169 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 21:55:24,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2024-08-19 21:56:07,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4548790.0, ans=0.125 2024-08-19 21:56:08,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4548790.0, ans=0.1 2024-08-19 21:56:16,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4548790.0, ans=0.0 2024-08-19 21:56:20,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4548890.0, ans=0.1 2024-08-19 21:56:21,615 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10300, loss[loss=0.1014, beats_loss=0.01194, ecapa_loss=0.0001305, whisper_loss=0.08818, over 22155.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01057, ecapa_loss=0.0001417, whisper_loss=0.08825, over 3827664.81 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:56:25,267 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-19 21:56:26,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4548890.0, ans=0.125 2024-08-19 21:56:30,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4548890.0, ans=0.125 2024-08-19 21:56:33,607 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 30 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 21:57:03,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2024-08-19 21:57:16,655 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-19 21:57:16,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4549090.0, ans=0.1 2024-08-19 21:57:50,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.288e+01 2.512e+01 2.813e+01 4.612e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-19 21:57:57,473 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 21:57:59,990 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10350, loss[loss=0.1054, beats_loss=0.01063, ecapa_loss=0.000153, whisper_loss=0.09322, over 15136.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01055, ecapa_loss=0.0001413, whisper_loss=0.08872, over 3817241.67 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:58:38,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4549490.0, ans=0.125 2024-08-19 21:58:40,747 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 21:58:56,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4549590.0, ans=0.0 2024-08-19 21:59:16,057 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 21:59:25,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4549790.0, ans=0.125 2024-08-19 21:59:34,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4549790.0, ans=0.2 2024-08-19 21:59:39,691 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10400, loss[loss=0.1024, beats_loss=0.0101, ecapa_loss=0.0001361, whisper_loss=0.09095, over 23085.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.00014, whisper_loss=0.08959, over 3844314.32 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:59:44,703 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 21:59:46,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4549890.0, ans=0.125 2024-08-19 21:59:49,804 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09229938685894012, model_norm_threshold=50.23906326293945 2024-08-19 21:59:49,968 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.110e+06, orig_rms_sq=1.059e-02 2024-08-19 21:59:56,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4549890.0, ans=0.07 2024-08-19 22:00:17,746 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 22:00:26,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=22.5 2024-08-19 22:00:28,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-19 22:01:09,171 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 22:01:14,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.360e+01 2.658e+01 2.930e+01 5.443e+02, threshold=5.316e+01, percent-clipped=2.0 2024-08-19 22:01:23,790 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10450, loss[loss=0.1011, beats_loss=0.01071, ecapa_loss=0.0001731, whisper_loss=0.08862, over 20841.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08979, over 3836818.95 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:01:25,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4550390.0, ans=0.2 2024-08-19 22:01:31,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4550390.0, ans=0.0 2024-08-19 22:01:35,562 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 22:01:50,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4550490.0, ans=0.1 2024-08-19 22:02:11,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-19 22:02:42,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-19 22:02:51,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4550790.0, ans=0.025 2024-08-19 22:02:51,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4550790.0, ans=0.125 2024-08-19 22:03:05,348 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 22:03:08,087 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 22:03:09,021 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10500, loss[loss=0.09109, beats_loss=0.01215, ecapa_loss=0.0001322, whisper_loss=0.07761, over 19973.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.0902, over 3851442.31 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:03:18,906 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 22:03:24,098 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 13 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 22:03:44,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4550990.0, ans=0.0 2024-08-19 22:03:58,071 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-19 22:04:02,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4551090.0, ans=0.125 2024-08-19 22:04:07,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4551090.0, ans=0.125 2024-08-19 22:04:07,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-19 22:04:18,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4551190.0, ans=0.035 2024-08-19 22:04:43,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.385e+01 2.732e+01 3.050e+01 1.536e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-19 22:04:47,607 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2024-08-19 22:04:54,236 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 22:04:55,236 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10550, loss[loss=0.09326, beats_loss=0.01077, ecapa_loss=0.0001334, whisper_loss=0.08116, over 18284.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.08982, over 3880219.16 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:05:05,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4551390.0, ans=0.1 2024-08-19 22:05:32,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-08-19 22:05:40,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4551590.0, ans=0.125 2024-08-19 22:05:51,927 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 22:06:41,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4551790.0, ans=0.125 2024-08-19 22:06:46,836 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10600, loss[loss=0.1063, beats_loss=0.009726, ecapa_loss=0.000137, whisper_loss=0.09517, over 16036.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.00014, whisper_loss=0.089, over 3840588.99 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:07:07,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4551990.0, ans=0.0 2024-08-19 22:07:25,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4551990.0, ans=0.1 2024-08-19 22:07:28,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4552090.0, ans=0.0 2024-08-19 22:07:50,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4552190.0, ans=0.0 2024-08-19 22:07:51,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4552190.0, ans=0.0 2024-08-19 22:07:59,822 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 35 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 22:08:01,088 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 22:08:02,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-08-19 22:08:15,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4552290.0, ans=0.125 2024-08-19 22:08:17,138 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-19 22:08:21,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.245e+01 2.495e+01 2.796e+01 7.063e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-19 22:08:32,209 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10650, loss[loss=0.09759, beats_loss=0.01068, ecapa_loss=0.0001314, whisper_loss=0.0856, over 19603.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.08944, over 3843387.99 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:08:36,973 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 22:08:37,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-19 22:09:02,237 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 22:09:02,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4552490.0, ans=0.125 2024-08-19 22:09:06,178 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 22:09:15,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4552590.0, ans=0.07 2024-08-19 22:09:33,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-08-19 22:10:02,098 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-19 22:10:10,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4552790.0, ans=0.0 2024-08-19 22:10:11,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4552790.0, ans=0.04949747468305833 2024-08-19 22:10:12,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2024-08-19 22:10:15,046 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10700, loss[loss=0.09171, beats_loss=0.006768, ecapa_loss=0.0001578, whisper_loss=0.08336, over 14338.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001398, whisper_loss=0.09024, over 3817924.57 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:10:19,599 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 22:10:42,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4552990.0, ans=0.2 2024-08-19 22:11:41,191 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 29 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 22:11:54,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.310e+01 2.569e+01 2.819e+01 4.934e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 22:12:02,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4553390.0, ans=0.1 2024-08-19 22:12:03,573 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10750, loss[loss=0.06836, beats_loss=0.014, ecapa_loss=0.0001053, whisper_loss=0.05331, over 19307.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.09009, over 3831474.75 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:12:05,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4553390.0, ans=0.015 2024-08-19 22:12:22,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2024-08-19 22:12:42,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4553490.0, ans=0.125 2024-08-19 22:12:57,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4553590.0, ans=0.0 2024-08-19 22:13:11,217 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 22:13:31,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-19 22:13:44,353 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10800, loss[loss=0.1007, beats_loss=0.01241, ecapa_loss=0.0001369, whisper_loss=0.08694, over 22617.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001399, whisper_loss=0.08981, over 3849375.32 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:14:09,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4553990.0, ans=0.04949747468305833 2024-08-19 22:14:35,408 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 22:15:16,916 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.322e+01 2.506e+01 2.895e+01 8.208e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-19 22:15:26,479 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10850, loss[loss=0.09402, beats_loss=0.01155, ecapa_loss=0.000131, whisper_loss=0.08116, over 22180.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.09012, over 3839776.78 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:15:58,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4554490.0, ans=0.125 2024-08-19 22:16:06,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4554590.0, ans=0.0 2024-08-19 22:16:35,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-19 22:17:11,274 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10900, loss[loss=0.09062, beats_loss=0.009376, ecapa_loss=0.0001388, whisper_loss=0.07986, over 18496.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.09037, over 3859505.29 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:17:22,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4554890.0, ans=0.07 2024-08-19 22:17:48,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4554990.0, ans=0.125 2024-08-19 22:18:10,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4555090.0, ans=0.2 2024-08-19 22:18:23,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4555190.0, ans=0.125 2024-08-19 22:18:29,540 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 22:18:42,891 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 13 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-19 22:18:51,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.334e+01 2.577e+01 2.862e+01 5.392e+01, threshold=5.154e+01, percent-clipped=2.0 2024-08-19 22:18:51,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4555290.0, ans=0.125 2024-08-19 22:19:03,221 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 10950, loss[loss=0.1034, beats_loss=0.01016, ecapa_loss=0.0001458, whisper_loss=0.09181, over 22658.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01029, ecapa_loss=0.000141, whisper_loss=0.09125, over 3840965.72 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:19:26,254 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 22:20:04,775 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2024-08-19 22:20:18,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4555690.0, ans=0.1 2024-08-19 22:20:55,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-19 22:20:57,496 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11000, loss[loss=0.1106, beats_loss=0.01, ecapa_loss=0.0001554, whisper_loss=0.099, over 22943.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01025, ecapa_loss=0.0001425, whisper_loss=0.09219, over 3843871.27 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:21:02,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4555890.0, ans=0.0 2024-08-19 22:21:10,582 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 22:21:33,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-19 22:21:40,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4555990.0, ans=0.2 2024-08-19 22:21:46,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4556090.0, ans=0.05 2024-08-19 22:21:58,306 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 22:22:08,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-19 22:22:24,579 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=12.0 2024-08-19 22:22:40,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.290e+01 2.533e+01 2.932e+01 4.213e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 22:22:51,779 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11050, loss[loss=0.08692, beats_loss=0.01003, ecapa_loss=0.0001754, whisper_loss=0.07514, over 15570.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.09167, over 3848657.50 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:23:22,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4556490.0, ans=0.125 2024-08-19 22:24:17,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.46 vs. limit=22.5 2024-08-19 22:24:28,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4556790.0, ans=0.125 2024-08-19 22:24:40,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4556790.0, ans=0.125 2024-08-19 22:24:45,938 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11100, loss[loss=0.08665, beats_loss=0.01088, ecapa_loss=0.0001397, whisper_loss=0.07437, over 16871.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01033, ecapa_loss=0.0001417, whisper_loss=0.09154, over 3835479.55 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:24:47,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4556890.0, ans=0.2 2024-08-19 22:24:59,173 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 22:25:04,811 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 18 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 22:25:12,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4556990.0, ans=0.0 2024-08-19 22:25:29,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-19 22:25:31,620 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 22:25:38,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4557090.0, ans=0.0 2024-08-19 22:25:45,047 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 22:26:18,795 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:26:28,496 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 22:26:30,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.214e+01 2.398e+01 2.634e+01 3.893e+01, threshold=4.795e+01, percent-clipped=0.0 2024-08-19 22:26:35,366 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 22:26:41,236 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11150, loss[loss=0.1019, beats_loss=0.009678, ecapa_loss=0.0001456, whisper_loss=0.09077, over 20430.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09068, over 3836643.03 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:27:44,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4557590.0, ans=0.125 2024-08-19 22:28:38,629 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-19 22:28:38,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4557890.0, ans=0.07 2024-08-19 22:28:39,600 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11200, loss[loss=0.1137, beats_loss=0.01252, ecapa_loss=0.0001161, whisper_loss=0.1, over 22868.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.0914, over 3893812.42 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:28:57,155 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 22:29:24,462 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-19 22:29:24,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4557990.0, ans=0.07 2024-08-19 22:29:26,984 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 22:29:39,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4558090.0, ans=0.125 2024-08-19 22:29:48,056 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:30:13,206 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 22:30:30,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4558290.0, ans=0.07 2024-08-19 22:30:34,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.356e+01 2.585e+01 2.931e+01 9.399e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 22:30:47,230 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11250, loss[loss=0.08038, beats_loss=0.009737, ecapa_loss=0.0001299, whisper_loss=0.06935, over 15863.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.09089, over 3861163.34 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:30:56,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4558390.0, ans=0.0 2024-08-19 22:31:51,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4558590.0, ans=0.07 2024-08-19 22:32:10,907 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-19 22:32:11,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4558690.0, ans=0.025 2024-08-19 22:32:20,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4558790.0, ans=0.125 2024-08-19 22:32:46,939 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11300, loss[loss=0.1, beats_loss=0.01021, ecapa_loss=0.0001658, whisper_loss=0.08816, over 21836.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001392, whisper_loss=0.09022, over 3861216.61 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:33:12,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4558990.0, ans=0.125 2024-08-19 22:33:24,099 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 22:33:38,035 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-19 22:33:55,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4559090.0, ans=0.0 2024-08-19 22:34:36,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.198e+01 2.361e+01 2.678e+01 4.685e+01, threshold=4.723e+01, percent-clipped=0.0 2024-08-19 22:34:47,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4559390.0, ans=0.125 2024-08-19 22:34:48,378 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11350, loss[loss=0.1004, beats_loss=0.01121, ecapa_loss=0.0001038, whisper_loss=0.08816, over 19922.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.08955, over 3835342.97 frames. ], batch size: 76, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:35:50,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4559590.0, ans=0.125 2024-08-19 22:36:12,899 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 22:36:51,796 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11400, loss[loss=0.09655, beats_loss=0.01106, ecapa_loss=0.000154, whisper_loss=0.08395, over 21662.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.08943, over 3818922.76 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:37:25,176 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 44 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-19 22:38:22,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-08-19 22:38:29,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4560190.0, ans=0.125 2024-08-19 22:38:40,770 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 22:38:44,543 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.308e+01 2.574e+01 2.916e+01 2.255e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-19 22:38:56,446 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11450, loss[loss=0.1078, beats_loss=0.01188, ecapa_loss=0.0001535, whisper_loss=0.09439, over 23008.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.08998, over 3801787.27 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:39:14,058 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 22:39:23,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4560490.0, ans=0.0 2024-08-19 22:39:28,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4560490.0, ans=0.125 2024-08-19 22:39:31,375 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 22:39:32,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-19 22:39:40,872 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 17 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 22:39:42,649 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 22:40:06,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4560690.0, ans=0.125 2024-08-19 22:40:07,201 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 22:40:26,248 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 22:40:33,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4560790.0, ans=0.125 2024-08-19 22:40:36,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4560790.0, ans=0.1 2024-08-19 22:40:39,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4560790.0, ans=0.015 2024-08-19 22:40:55,971 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11500, loss[loss=0.0842, beats_loss=0.01109, ecapa_loss=0.0001679, whisper_loss=0.07143, over 19679.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.08978, over 3783140.25 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:41:05,608 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 22:41:09,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4560890.0, ans=0.0 2024-08-19 22:41:39,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-08-19 22:41:42,052 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 22:42:39,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.481e+01 2.784e+01 2.057e+02, threshold=4.962e+01, percent-clipped=3.0 2024-08-19 22:42:51,361 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11550, loss[loss=0.1202, beats_loss=0.009745, ecapa_loss=0.0001384, whisper_loss=0.1091, over 22053.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.08928, over 3781081.92 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:42:57,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4561390.0, ans=0.125 2024-08-19 22:42:59,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4561390.0, ans=0.0 2024-08-19 22:43:04,001 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 22:43:59,879 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:44:02,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-19 22:44:02,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2024-08-19 22:44:09,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4561690.0, ans=0.0 2024-08-19 22:44:11,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4561690.0, ans=0.125 2024-08-19 22:44:19,570 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 11 from Vox, 45 fro AS 2024-08-19 22:44:30,631 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 22:44:43,866 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11600, loss[loss=0.109, beats_loss=0.01082, ecapa_loss=0.0001439, whisper_loss=0.09679, over 22361.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08997, over 3791159.05 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:44:44,029 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 22:44:58,120 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 22:45:06,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4561990.0, ans=0.2 2024-08-19 22:45:27,444 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 22:45:28,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-08-19 22:45:35,537 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4562090.0, ans=0.1 2024-08-19 22:46:10,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.615e+01 2.277e+01 2.484e+01 2.833e+01 4.332e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 22:46:19,762 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11650, loss[loss=0.08886, beats_loss=0.01138, ecapa_loss=0.0001294, whisper_loss=0.07618, over 17200.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.09052, over 3800416.87 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:46:26,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4562390.0, ans=0.125 2024-08-19 22:46:51,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4562490.0, ans=0.125 2024-08-19 22:46:55,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4562490.0, ans=0.125 2024-08-19 22:46:58,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4562590.0, ans=0.0 2024-08-19 22:47:21,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4562690.0, ans=0.125 2024-08-19 22:47:31,560 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 22:47:54,993 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 22:48:01,565 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11700, loss[loss=0.09804, beats_loss=0.01087, ecapa_loss=0.0001244, whisper_loss=0.08593, over 23482.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.0908, over 3797384.38 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:48:06,762 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 22:48:08,193 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 22:48:28,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-08-19 22:48:28,797 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 22:48:30,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4562990.0, ans=0.0 2024-08-19 22:48:56,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4563090.0, ans=0.2 2024-08-19 22:48:56,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4563090.0, ans=0.07 2024-08-19 22:49:17,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4563190.0, ans=0.0 2024-08-19 22:49:24,201 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 22:49:27,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4563290.0, ans=0.07 2024-08-19 22:49:27,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4563290.0, ans=0.125 2024-08-19 22:49:34,908 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.355e+01 2.582e+01 2.983e+01 7.922e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-19 22:49:47,380 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11750, loss[loss=0.09088, beats_loss=0.01178, ecapa_loss=0.0001639, whisper_loss=0.07746, over 21264.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001388, whisper_loss=0.09061, over 3837250.63 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-19 22:50:40,351 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 22:50:51,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4563590.0, ans=0.125 2024-08-19 22:51:06,853 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-19 22:51:22,727 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 22:51:30,295 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 22:51:30,759 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.728e-01 2024-08-19 22:51:41,658 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-08-19 22:51:42,168 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11800, loss[loss=0.1047, beats_loss=0.00946, ecapa_loss=0.0001509, whisper_loss=0.09376, over 21338.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.09078, over 3814460.30 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:51:44,648 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 22:51:45,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4563890.0, ans=0.2 2024-08-19 22:52:17,262 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 22:52:49,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4564190.0, ans=0.125 2024-08-19 22:52:56,067 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 12 from Vox, 47 fro AS 2024-08-19 22:53:05,121 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 22:53:05,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4564190.0, ans=0.125 2024-08-19 22:53:12,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4564290.0, ans=0.125 2024-08-19 22:53:21,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4564290.0, ans=0.125 2024-08-19 22:53:24,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.230e+01 2.402e+01 2.738e+01 3.572e+01, threshold=4.803e+01, percent-clipped=0.0 2024-08-19 22:53:31,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=12.0 2024-08-19 22:53:33,756 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11850, loss[loss=0.101, beats_loss=0.01281, ecapa_loss=0.000126, whisper_loss=0.08689, over 22174.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001393, whisper_loss=0.09079, over 3829764.73 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:53:40,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4564390.0, ans=10.0 2024-08-19 22:54:42,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4564690.0, ans=0.1 2024-08-19 22:55:25,501 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11900, loss[loss=0.07458, beats_loss=0.01087, ecapa_loss=0.0001387, whisper_loss=0.06232, over 12498.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001407, whisper_loss=0.09036, over 3806093.02 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:55:56,396 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:56:02,138 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 22:56:06,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4564990.0, ans=0.125 2024-08-19 22:56:15,567 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 22:56:22,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4565090.0, ans=0.1 2024-08-19 22:56:45,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4565190.0, ans=0.0 2024-08-19 22:56:48,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4565190.0, ans=0.125 2024-08-19 22:57:06,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.317e+01 2.609e+01 2.861e+01 6.356e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-19 22:57:08,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2024-08-19 22:57:15,625 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 11950, loss[loss=0.09068, beats_loss=0.01305, ecapa_loss=0.0001447, whisper_loss=0.07619, over 18086.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001419, whisper_loss=0.0899, over 3817086.99 frames. ], batch size: 76, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:57:24,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4565390.0, ans=0.1 2024-08-19 22:57:32,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4565390.0, ans=0.125 2024-08-19 22:57:35,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4565490.0, ans=0.125 2024-08-19 22:57:39,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-19 22:57:42,652 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 22:58:25,525 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 31 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 22:58:58,908 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12000, loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001409, whisper_loss=0.09247, over 17915.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.09038, over 3848896.54 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:58:58,909 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-19 22:59:35,910 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005134, whisper_loss=0.2483, over 931116.00 frames. 2024-08-19 23:00:01,579 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on SV_voxceleb1: loss=0.003987, beats_loss=0, ecapa_loss=0.0003987, whisper_loss=0, over 944235.00 frames. 2024-08-19 23:01:39,577 INFO [train_multi_KD3.py:1150] (2/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 23:01:39,581 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-19 23:01:41,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4565890.0, ans=0.125 2024-08-19 23:01:47,363 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 28 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 23:01:53,327 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 23:02:56,142 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 22 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-19 23:03:21,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.392e+01 2.640e+01 2.904e+01 4.083e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-19 23:03:31,865 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12050, loss[loss=0.08809, beats_loss=0.01059, ecapa_loss=0.0001841, whisper_loss=0.07566, over 17673.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.09059, over 3856474.06 frames. ], batch size: 78, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:03:48,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4566390.0, ans=0.1 2024-08-19 23:03:55,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4566490.0, ans=0.125 2024-08-19 23:04:10,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4566490.0, ans=0.1 2024-08-19 23:04:17,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4566590.0, ans=0.95 2024-08-19 23:04:20,622 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 23:04:31,192 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 23:04:49,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4566690.0, ans=0.2 2024-08-19 23:04:56,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4566690.0, ans=0.125 2024-08-19 23:04:58,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4566790.0, ans=0.125 2024-08-19 23:05:04,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4566790.0, ans=0.125 2024-08-19 23:05:11,022 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04405633732676506, model_norm_threshold=52.79664611816406 2024-08-19 23:05:11,183 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.250e+05, grad_sumsq=2.117e+07, orig_rms_sq=1.063e-02 2024-08-19 23:05:14,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4566790.0, ans=0.0 2024-08-19 23:05:19,541 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12100, loss[loss=0.1076, beats_loss=0.009519, ecapa_loss=0.0001736, whisper_loss=0.09636, over 13890.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001418, whisper_loss=0.09, over 3841197.97 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:05:26,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4566890.0, ans=0.125 2024-08-19 23:05:36,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4566890.0, ans=0.0 2024-08-19 23:05:38,728 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 23:05:47,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4566990.0, ans=0.0 2024-08-19 23:05:54,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.32 vs. limit=22.5 2024-08-19 23:06:07,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4567090.0, ans=0.1 2024-08-19 23:06:07,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4567090.0, ans=0.125 2024-08-19 23:06:21,572 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-19 23:06:29,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4567190.0, ans=0.125 2024-08-19 23:06:34,195 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 23:06:52,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4567290.0, ans=0.125 2024-08-19 23:06:57,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.332e+01 2.618e+01 3.087e+01 1.198e+03, threshold=5.236e+01, percent-clipped=2.0 2024-08-19 23:07:06,256 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12150, loss[loss=0.08961, beats_loss=0.009759, ecapa_loss=0.0001031, whisper_loss=0.07882, over 20477.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001429, whisper_loss=0.09121, over 3850046.86 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:07:09,414 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-19 23:07:16,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4567390.0, ans=0.125 2024-08-19 23:07:25,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4567490.0, ans=0.125 2024-08-19 23:07:25,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4567490.0, ans=0.125 2024-08-19 23:07:56,359 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 23:08:23,983 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 23:08:27,665 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 23:08:43,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4567790.0, ans=0.2 2024-08-19 23:08:48,730 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 23:08:49,860 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12200, loss[loss=0.08705, beats_loss=0.008642, ecapa_loss=0.0001289, whisper_loss=0.07712, over 16968.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001424, whisper_loss=0.09068, over 3817902.69 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:09:18,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4567990.0, ans=0.1 2024-08-19 23:09:20,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4567990.0, ans=0.125 2024-08-19 23:09:23,549 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 23:09:34,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4568090.0, ans=0.0 2024-08-19 23:09:47,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4568190.0, ans=0.125 2024-08-19 23:10:15,545 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 26 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-19 23:10:18,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.296e+01 2.498e+01 2.784e+01 3.860e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 23:10:22,190 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 23:10:22,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2024-08-19 23:10:26,695 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12250, loss[loss=0.09892, beats_loss=0.009366, ecapa_loss=0.0001725, whisper_loss=0.08783, over 13522.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001421, whisper_loss=0.09081, over 3845229.00 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:10:32,953 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-19 23:11:01,645 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 23:11:05,610 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 23:11:43,724 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 23:11:56,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4568790.0, ans=0.125 2024-08-19 23:12:03,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4568890.0, ans=15.0 2024-08-19 23:12:03,950 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12300, loss[loss=0.1115, beats_loss=0.01081, ecapa_loss=0.0001081, whisper_loss=0.09962, over 18824.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001423, whisper_loss=0.09083, over 3833423.88 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:12:19,923 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 23:12:31,757 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 23:12:45,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4569090.0, ans=0.125 2024-08-19 23:12:45,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4569090.0, ans=0.125 2024-08-19 23:12:53,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4569090.0, ans=0.0 2024-08-19 23:13:01,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4569090.0, ans=0.1 2024-08-19 23:13:03,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4569190.0, ans=0.125 2024-08-19 23:13:10,566 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2024-08-19 23:13:34,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.176e+01 2.435e+01 2.712e+01 4.279e+01, threshold=4.869e+01, percent-clipped=0.0 2024-08-19 23:13:39,477 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4569290.0, ans=0.1 2024-08-19 23:13:42,437 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12350, loss[loss=0.101, beats_loss=0.01184, ecapa_loss=0.0001725, whisper_loss=0.08743, over 21074.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001426, whisper_loss=0.0906, over 3830908.86 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:13:51,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-19 23:14:46,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4569690.0, ans=0.025 2024-08-19 23:15:14,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4569790.0, ans=0.0 2024-08-19 23:15:15,292 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 23:15:18,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4569790.0, ans=0.2 2024-08-19 23:15:24,709 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12400, loss[loss=0.08545, beats_loss=0.01266, ecapa_loss=9.902e-05, whisper_loss=0.0718, over 16642.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09085, over 3868695.81 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:15:46,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2024-08-19 23:16:03,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4569990.0, ans=0.1 2024-08-19 23:16:12,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4570090.0, ans=0.07 2024-08-19 23:16:31,439 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 23:17:00,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.374e+01 2.637e+01 2.909e+01 4.258e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-19 23:17:09,504 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12450, loss[loss=0.08958, beats_loss=0.01245, ecapa_loss=9.581e-05, whisper_loss=0.07617, over 23438.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.08967, over 3873372.12 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:17:13,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4570390.0, ans=0.125 2024-08-19 23:17:48,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4570490.0, ans=0.0 2024-08-19 23:18:08,060 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 23:18:32,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4570790.0, ans=0.125 2024-08-19 23:18:51,833 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12500, loss[loss=0.1115, beats_loss=0.009259, ecapa_loss=0.0001456, whisper_loss=0.1008, over 18450.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.09003, over 3847763.62 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:18:52,079 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 18 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 23:19:00,074 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 23:19:12,475 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 23:19:25,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4570990.0, ans=0.1 2024-08-19 23:19:33,768 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 23:19:54,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4571090.0, ans=0.125 2024-08-19 23:20:17,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4571190.0, ans=0.05 2024-08-19 23:20:37,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-08-19 23:20:37,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.234e+01 2.500e+01 2.814e+01 4.349e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 23:20:47,944 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12550, loss[loss=0.115, beats_loss=0.008252, ecapa_loss=0.0001704, whisper_loss=0.105, over 22238.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.09062, over 3855414.24 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:20:54,749 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 17 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-19 23:21:03,960 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 23:21:08,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4571490.0, ans=0.2 2024-08-19 23:21:25,056 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 23:21:29,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-19 23:21:41,269 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 23:22:06,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4571690.0, ans=0.125 2024-08-19 23:22:35,981 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 23:22:37,388 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12600, loss[loss=0.09096, beats_loss=0.01142, ecapa_loss=0.0001359, whisper_loss=0.07819, over 22527.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001412, whisper_loss=0.09055, over 3851717.55 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:22:51,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4571890.0, ans=0.1 2024-08-19 23:23:17,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4571990.0, ans=0.125 2024-08-19 23:23:21,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.34 vs. limit=22.5 2024-08-19 23:23:43,965 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:23:48,245 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 23:23:53,334 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 23:24:10,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-19 23:24:32,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.292e+01 2.504e+01 2.662e+01 4.267e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-19 23:24:43,482 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12650, loss[loss=0.08722, beats_loss=0.01178, ecapa_loss=0.0001136, whisper_loss=0.07431, over 23147.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001401, whisper_loss=0.08989, over 3870868.70 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:24:53,548 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 34 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 23:25:01,155 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 18 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-19 23:25:16,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4572490.0, ans=0.0 2024-08-19 23:25:38,544 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 23:25:42,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4572590.0, ans=0.0 2024-08-19 23:25:42,837 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4572590.0, ans=0.1 2024-08-19 23:25:45,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4572590.0, ans=0.0 2024-08-19 23:26:06,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:10,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:13,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:39,135 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12700, loss[loss=0.1004, beats_loss=0.0105, ecapa_loss=0.0001406, whisper_loss=0.08847, over 21957.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001412, whisper_loss=0.09036, over 3856289.35 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:27:12,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2024-08-19 23:27:18,130 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-08-19 23:27:22,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4572990.0, ans=0.125 2024-08-19 23:27:34,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4573090.0, ans=0.1 2024-08-19 23:27:39,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4573090.0, ans=0.125 2024-08-19 23:27:42,670 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 7 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 23:27:49,567 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 23:28:17,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4573290.0, ans=6.0 2024-08-19 23:28:20,630 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 23:28:25,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.359e+01 2.539e+01 2.793e+01 4.602e+02, threshold=5.078e+01, percent-clipped=1.0 2024-08-19 23:28:34,452 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12750, loss[loss=0.1109, beats_loss=0.009415, ecapa_loss=0.000133, whisper_loss=0.1002, over 18034.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001407, whisper_loss=0.0903, over 3810800.49 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:29:09,772 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 23:29:18,841 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 23:29:36,597 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 23:29:43,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4573690.0, ans=0.0 2024-08-19 23:29:50,076 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 23:29:54,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4573690.0, ans=0.0 2024-08-19 23:29:59,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4573690.0, ans=0.05 2024-08-19 23:30:01,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4573690.0, ans=0.2 2024-08-19 23:30:21,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4573790.0, ans=0.125 2024-08-19 23:30:22,131 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 18 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-19 23:30:22,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4573790.0, ans=0.1 2024-08-19 23:30:34,023 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12800, loss[loss=0.1245, beats_loss=0.008753, ecapa_loss=0.0001579, whisper_loss=0.1142, over 15768.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001412, whisper_loss=0.09028, over 3825834.30 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:30:48,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4573890.0, ans=0.09899494936611666 2024-08-19 23:31:16,223 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:31:50,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-19 23:32:01,296 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 23:32:05,673 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 23:32:06,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=22.5 2024-08-19 23:32:27,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.310e+01 2.474e+01 2.739e+01 4.049e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-19 23:32:38,882 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12850, loss[loss=0.09411, beats_loss=0.01096, ecapa_loss=0.0001778, whisper_loss=0.08137, over 14256.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.08972, over 3810224.48 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:32:42,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-19 23:33:04,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4574490.0, ans=0.0 2024-08-19 23:33:09,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4574490.0, ans=0.025 2024-08-19 23:33:34,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2024-08-19 23:33:37,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4574590.0, ans=0.125 2024-08-19 23:33:40,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-19 23:33:45,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4574590.0, ans=0.125 2024-08-19 23:34:01,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4574690.0, ans=0.125 2024-08-19 23:34:09,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4574690.0, ans=0.0 2024-08-19 23:34:32,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=12.0 2024-08-19 23:34:43,113 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12900, loss[loss=0.1075, beats_loss=0.01054, ecapa_loss=0.0001484, whisper_loss=0.09552, over 22728.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.09018, over 3805415.57 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:34:45,672 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 23:34:55,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4574890.0, ans=15.0 2024-08-19 23:35:12,634 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 23:35:20,475 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 23:35:21,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2024-08-19 23:35:24,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4574990.0, ans=0.125 2024-08-19 23:35:42,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2024-08-19 23:35:57,545 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:35:59,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4575190.0, ans=0.125 2024-08-19 23:36:06,497 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 23:36:34,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.306e+01 2.601e+01 3.029e+01 4.481e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-19 23:36:44,572 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 12950, loss[loss=0.07786, beats_loss=0.01135, ecapa_loss=0.0001321, whisper_loss=0.06519, over 13260.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001405, whisper_loss=0.09019, over 3815896.98 frames. ], batch size: 50, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:36:47,839 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4575390.0, ans=0.125 2024-08-19 23:36:52,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4575390.0, ans=0.0 2024-08-19 23:37:27,921 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 23:37:32,107 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 23:37:35,970 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 23:37:40,463 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 23:38:10,054 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 23:38:12,437 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 23:38:42,774 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13000, loss[loss=0.1202, beats_loss=0.01053, ecapa_loss=0.0001286, whisper_loss=0.1084, over 23714.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001406, whisper_loss=0.08989, over 3813645.16 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:38:51,048 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 23:38:59,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4575890.0, ans=0.125 2024-08-19 23:38:59,692 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-19 23:39:27,942 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:39:34,286 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 16 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 23:39:42,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4576090.0, ans=0.1 2024-08-19 23:40:09,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4576190.0, ans=0.125 2024-08-19 23:40:15,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-19 23:40:22,838 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-19 23:40:26,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4576290.0, ans=0.1 2024-08-19 23:40:30,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.284e+01 2.434e+01 2.790e+01 4.214e+01, threshold=4.868e+01, percent-clipped=0.0 2024-08-19 23:40:38,332 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13050, loss[loss=0.09666, beats_loss=0.01017, ecapa_loss=0.0001082, whisper_loss=0.08541, over 14099.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001406, whisper_loss=0.09017, over 3815485.41 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:41:19,645 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 23:41:24,478 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-19 23:41:35,925 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06296969205141068, model_norm_threshold=48.684600830078125 2024-08-19 23:41:36,089 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.824e+04, grad_sumsq=7.824e+04, orig_rms_sq=1.000e+00 2024-08-19 23:41:53,491 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 23:41:58,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4576690.0, ans=0.5 2024-08-19 23:42:09,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4576790.0, ans=0.0 2024-08-19 23:42:22,011 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13100, loss[loss=0.1181, beats_loss=0.009135, ecapa_loss=0.0001837, whisper_loss=0.1071, over 21875.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01071, ecapa_loss=0.000141, whisper_loss=0.08979, over 3823158.84 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:42:32,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4576890.0, ans=0.125 2024-08-19 23:42:38,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4576890.0, ans=0.125 2024-08-19 23:42:38,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2024-08-19 23:43:04,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4577090.0, ans=22.5 2024-08-19 23:44:02,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.292e+01 2.525e+01 2.909e+01 7.731e+02, threshold=5.050e+01, percent-clipped=3.0 2024-08-19 23:44:10,311 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13150, loss[loss=0.09147, beats_loss=0.0103, ecapa_loss=0.0001315, whisper_loss=0.07986, over 22485.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.08994, over 3800741.56 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:44:16,316 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 18 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 23:44:21,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4577390.0, ans=0.125 2024-08-19 23:44:46,950 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 24 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-19 23:45:08,907 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 23:45:11,062 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 23:45:43,169 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13200, loss[loss=0.09082, beats_loss=0.01087, ecapa_loss=0.0001317, whisper_loss=0.07863, over 14637.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.09021, over 3793723.15 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:45:47,951 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 23:45:55,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=12.0 2024-08-19 23:46:53,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4578290.0, ans=0.1 2024-08-19 23:46:57,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4578290.0, ans=0.0 2024-08-19 23:46:58,611 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 23:47:04,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4578290.0, ans=0.05 2024-08-19 23:47:05,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.290e+01 2.445e+01 2.755e+01 3.843e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-19 23:47:12,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13250, loss[loss=0.08638, beats_loss=0.01273, ecapa_loss=0.0001582, whisper_loss=0.07207, over 21511.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001421, whisper_loss=0.08989, over 3798380.18 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:47:42,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4578490.0, ans=0.2 2024-08-19 23:47:42,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4578490.0, ans=0.125 2024-08-19 23:47:44,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-19 23:47:50,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:07,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4578590.0, ans=0.04949747468305833 2024-08-19 23:48:07,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:08,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:13,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4578690.0, ans=0.0 2024-08-19 23:48:21,099 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 23:48:31,575 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 23:48:33,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4578790.0, ans=0.125 2024-08-19 23:48:51,793 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13300, loss[loss=0.09229, beats_loss=0.008729, ecapa_loss=0.0001639, whisper_loss=0.08192, over 20560.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.08976, over 3841536.50 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:48:54,029 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 23:49:08,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4578890.0, ans=0.2 2024-08-19 23:49:20,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4578990.0, ans=0.125 2024-08-19 23:49:26,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4578990.0, ans=0.0 2024-08-19 23:49:32,656 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 23:49:35,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4579090.0, ans=15.0 2024-08-19 23:49:42,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=4579090.0, ans=0.02 2024-08-19 23:49:54,973 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 23:50:00,345 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 23:50:06,226 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-19 23:50:07,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-19 23:50:09,920 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:50:17,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.270e+01 2.522e+01 2.849e+01 4.114e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 23:50:23,051 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 23:50:24,320 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13350, loss[loss=0.1079, beats_loss=0.008675, ecapa_loss=0.0001861, whisper_loss=0.09738, over 20100.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001427, whisper_loss=0.0898, over 3839661.78 frames. ], batch size: 86, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:50:29,099 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 23:50:43,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4579490.0, ans=0.125 2024-08-19 23:50:53,673 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 23:51:18,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4579690.0, ans=0.125 2024-08-19 23:51:30,766 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 23:51:34,419 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 19 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-19 23:51:58,116 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13400, loss[loss=0.1074, beats_loss=0.01072, ecapa_loss=0.0001413, whisper_loss=0.09523, over 14448.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001425, whisper_loss=0.0899, over 3847497.40 frames. ], batch size: 58, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:52:02,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4579890.0, ans=0.0 2024-08-19 23:52:21,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-19 23:52:25,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4579990.0, ans=0.125 2024-08-19 23:52:26,562 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 9 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 23:52:35,000 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4580090.0, ans=0.025 2024-08-19 23:53:20,687 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 23:53:25,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.574e+01 2.343e+01 2.589e+01 2.932e+01 2.538e+02, threshold=5.179e+01, percent-clipped=4.0 2024-08-19 23:53:33,084 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13450, loss[loss=0.09362, beats_loss=0.01223, ecapa_loss=0.0001524, whisper_loss=0.07987, over 21078.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.08994, over 3827263.29 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:53:40,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-19 23:53:47,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2024-08-19 23:53:49,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4580390.0, ans=0.125 2024-08-19 23:54:13,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4580590.0, ans=0.125 2024-08-19 23:54:53,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=12.0 2024-08-19 23:54:58,306 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 23:55:05,987 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4580790.0, ans=0.2 2024-08-19 23:55:08,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4580790.0, ans=0.1 2024-08-19 23:55:10,925 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13500, loss[loss=0.0786, beats_loss=0.01199, ecapa_loss=0.000169, whisper_loss=0.06492, over 14065.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.0905, over 3831519.19 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:55:21,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-19 23:55:24,500 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-19 23:55:36,691 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 23:56:09,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4581190.0, ans=0.0 2024-08-19 23:56:09,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4581190.0, ans=0.125 2024-08-19 23:56:29,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4581290.0, ans=0.125 2024-08-19 23:56:31,346 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 23:56:35,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.361e+01 2.616e+01 2.856e+01 5.147e+01, threshold=5.232e+01, percent-clipped=0.0 2024-08-19 23:56:43,171 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13550, loss[loss=0.1102, beats_loss=0.01055, ecapa_loss=0.0001298, whisper_loss=0.0984, over 23880.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09022, over 3812454.58 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:56:51,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4581390.0, ans=0.125 2024-08-19 23:56:57,895 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 23:57:11,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4581490.0, ans=0.035 2024-08-19 23:57:30,287 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 23:57:39,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4581690.0, ans=0.09899494936611666 2024-08-19 23:57:41,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4581690.0, ans=0.0 2024-08-19 23:57:43,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4581690.0, ans=0.0 2024-08-19 23:57:43,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-19 23:57:44,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4581690.0, ans=0.0 2024-08-19 23:57:47,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4581690.0, ans=0.1 2024-08-19 23:58:14,950 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 17 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 23:58:17,086 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13600, loss[loss=0.0988, beats_loss=0.01015, ecapa_loss=0.000163, whisper_loss=0.08702, over 12297.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001415, whisper_loss=0.09044, over 3830208.31 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:58:41,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4581990.0, ans=0.1 2024-08-19 23:58:47,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4581990.0, ans=0.125 2024-08-19 23:58:56,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4582090.0, ans=0.0 2024-08-19 23:58:56,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-19 23:59:11,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4582190.0, ans=0.125 2024-08-19 23:59:16,136 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 21 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 23:59:25,229 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 23:59:25,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4582190.0, ans=0.125 2024-08-19 23:59:35,328 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 23:59:39,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.261e+01 2.439e+01 2.757e+01 6.326e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-19 23:59:46,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4582390.0, ans=0.0 2024-08-19 23:59:47,343 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13650, loss[loss=0.1174, beats_loss=0.01092, ecapa_loss=0.0001344, whisper_loss=0.1051, over 23322.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.09074, over 3847329.85 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:59:48,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4582390.0, ans=0.2 2024-08-19 23:59:56,289 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 23:59:56,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4582390.0, ans=0.1 2024-08-20 00:00:22,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2024-08-20 00:00:24,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4582590.0, ans=0.125 2024-08-20 00:01:20,257 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13700, loss[loss=0.09669, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.08494, over 19360.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.09112, over 3835895.49 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:01:22,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4582890.0, ans=0.125 2024-08-20 00:01:45,043 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 00:01:56,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4583090.0, ans=0.125 2024-08-20 00:02:02,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-20 00:02:46,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.351e+01 2.599e+01 2.817e+01 2.023e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-20 00:02:49,438 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 00:02:53,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4583390.0, ans=0.125 2024-08-20 00:02:54,317 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13750, loss[loss=0.1262, beats_loss=0.01034, ecapa_loss=0.0001159, whisper_loss=0.1147, over 22594.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001413, whisper_loss=0.09087, over 3831471.81 frames. ], batch size: 85, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:03:21,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4583490.0, ans=0.09899494936611666 2024-08-20 00:03:21,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4583490.0, ans=0.05 2024-08-20 00:03:23,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4583490.0, ans=0.0 2024-08-20 00:03:32,518 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 00:03:39,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4583590.0, ans=0.125 2024-08-20 00:03:41,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4583590.0, ans=0.025 2024-08-20 00:03:45,105 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 00:03:47,997 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 00:04:01,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4583690.0, ans=0.04949747468305833 2024-08-20 00:04:03,094 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 00:04:27,655 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13800, loss[loss=0.0749, beats_loss=0.0111, ecapa_loss=0.0001506, whisper_loss=0.06229, over 16251.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001417, whisper_loss=0.09065, over 3820661.40 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:04:46,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4583990.0, ans=0.02 2024-08-20 00:04:54,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4583990.0, ans=0.125 2024-08-20 00:05:08,181 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 00:05:51,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.269e+01 2.538e+01 2.800e+01 5.388e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-20 00:05:57,939 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13850, loss[loss=0.1004, beats_loss=0.01041, ecapa_loss=0.0001673, whisper_loss=0.08832, over 17197.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001415, whisper_loss=0.08967, over 3805583.35 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:06:10,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-08-20 00:06:28,517 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 00:06:32,375 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 00:07:16,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2024-08-20 00:07:18,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:22,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:24,227 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-20 00:07:28,615 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-20 00:07:30,774 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13900, loss[loss=0.1204, beats_loss=0.007218, ecapa_loss=0.0001685, whisper_loss=0.1115, over 13707.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001404, whisper_loss=0.08994, over 3815355.21 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:07:31,026 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 00:07:55,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4584990.0, ans=0.125 2024-08-20 00:08:03,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4584990.0, ans=0.0 2024-08-20 00:08:40,972 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 00:08:52,750 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 00:08:56,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.301e+01 2.533e+01 2.957e+01 6.862e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-20 00:09:03,977 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 13950, loss[loss=0.09202, beats_loss=0.01309, ecapa_loss=0.000125, whisper_loss=0.07769, over 14377.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001414, whisper_loss=0.08988, over 3821332.84 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:09:34,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4585490.0, ans=0.125 2024-08-20 00:09:42,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4585590.0, ans=0.125 2024-08-20 00:09:52,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4585590.0, ans=0.2 2024-08-20 00:09:58,838 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 20 from LS+wenet, 21 from Vox, 10 fro AS 2024-08-20 00:10:00,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4585690.0, ans=0.125 2024-08-20 00:10:09,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4585690.0, ans=0.0 2024-08-20 00:10:40,052 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14000, loss[loss=0.08696, beats_loss=0.01036, ecapa_loss=0.0001635, whisper_loss=0.07496, over 18083.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.0899, over 3818133.65 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:10:40,382 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 00:11:14,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4586090.0, ans=0.0 2024-08-20 00:11:18,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2024-08-20 00:11:54,840 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 00:12:07,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.217e+01 2.438e+01 2.736e+01 1.084e+02, threshold=4.877e+01, percent-clipped=1.0 2024-08-20 00:12:08,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4586290.0, ans=0.125 2024-08-20 00:12:14,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4586390.0, ans=0.125 2024-08-20 00:12:15,044 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14050, loss[loss=0.07523, beats_loss=0.01223, ecapa_loss=0.0001443, whisper_loss=0.06156, over 17348.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001416, whisper_loss=0.09003, over 3805902.18 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:12:19,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-20 00:12:50,726 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 00:13:06,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2024-08-20 00:13:49,160 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14100, loss[loss=0.1184, beats_loss=0.009367, ecapa_loss=0.0001696, whisper_loss=0.1073, over 17258.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09046, over 3827566.72 frames. ], batch size: 70, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:13:56,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4586890.0, ans=0.1 2024-08-20 00:14:04,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4586890.0, ans=0.1 2024-08-20 00:14:11,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4586990.0, ans=0.0 2024-08-20 00:14:14,920 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-20 00:14:21,671 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 00:14:23,491 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 00:14:27,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4587090.0, ans=0.2 2024-08-20 00:14:27,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4587090.0, ans=0.0 2024-08-20 00:14:34,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4587090.0, ans=0.125 2024-08-20 00:14:40,935 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-20 00:14:46,938 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 00:14:47,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-08-20 00:15:18,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.279e+01 2.555e+01 2.827e+01 5.250e+01, threshold=5.111e+01, percent-clipped=1.0 2024-08-20 00:15:24,381 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14150, loss[loss=0.1121, beats_loss=0.007578, ecapa_loss=0.0001391, whisper_loss=0.1032, over 18925.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.09028, over 3842317.44 frames. ], batch size: 73, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:15:35,613 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:16:07,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4587590.0, ans=0.05 2024-08-20 00:16:16,286 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 00:16:44,179 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 00:16:46,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4587790.0, ans=0.125 2024-08-20 00:16:59,913 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14200, loss[loss=0.1035, beats_loss=0.01072, ecapa_loss=0.0001153, whisper_loss=0.09163, over 21297.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.09057, over 3836474.76 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:17:05,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-20 00:17:11,832 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 28 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 00:17:17,011 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 00:17:18,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4587990.0, ans=0.0 2024-08-20 00:17:20,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-20 00:17:26,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4587990.0, ans=0.125 2024-08-20 00:17:32,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-20 00:17:45,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4588090.0, ans=0.05 2024-08-20 00:17:54,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4588190.0, ans=0.07 2024-08-20 00:17:59,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4588190.0, ans=0.0 2024-08-20 00:18:12,991 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 00:18:16,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4588290.0, ans=0.1 2024-08-20 00:18:22,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4588290.0, ans=0.125 2024-08-20 00:18:26,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4588290.0, ans=0.0 2024-08-20 00:18:27,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.209e+01 2.487e+01 2.835e+01 4.985e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-20 00:18:28,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2024-08-20 00:18:33,080 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14250, loss[loss=0.0865, beats_loss=0.009058, ecapa_loss=0.0001435, whisper_loss=0.07601, over 13135.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.09091, over 3814704.69 frames. ], batch size: 49, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:18:39,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4588390.0, ans=0.125 2024-08-20 00:18:41,550 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2024-08-20 00:18:42,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4588390.0, ans=0.125 2024-08-20 00:19:13,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4588590.0, ans=0.125 2024-08-20 00:19:13,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4588590.0, ans=0.125 2024-08-20 00:19:20,917 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 00:19:26,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4588590.0, ans=0.0 2024-08-20 00:20:06,760 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14300, loss[loss=0.1034, beats_loss=0.009435, ecapa_loss=8.517e-05, whisper_loss=0.09315, over 15209.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.09115, over 3830100.81 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:20:16,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4588890.0, ans=0.125 2024-08-20 00:20:23,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4588990.0, ans=0.125 2024-08-20 00:20:39,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-20 00:20:42,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4589090.0, ans=0.125 2024-08-20 00:21:16,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4589190.0, ans=0.125 2024-08-20 00:21:18,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4589190.0, ans=0.125 2024-08-20 00:21:36,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.288e+01 2.504e+01 2.843e+01 5.964e+01, threshold=5.008e+01, percent-clipped=1.0 2024-08-20 00:21:42,163 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14350, loss[loss=0.1186, beats_loss=0.008324, ecapa_loss=0.0001768, whisper_loss=0.1085, over 14106.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.09149, over 3834989.24 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:22:03,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4589490.0, ans=0.025 2024-08-20 00:22:34,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4589590.0, ans=0.0 2024-08-20 00:22:36,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4589590.0, ans=0.0 2024-08-20 00:22:37,540 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 00:22:39,175 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 26 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 00:22:54,194 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 00:23:06,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4589790.0, ans=0.125 2024-08-20 00:23:16,992 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14400, loss[loss=0.07894, beats_loss=0.01147, ecapa_loss=0.0001389, whisper_loss=0.06608, over 20844.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09068, over 3803139.54 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:23:19,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4589890.0, ans=0.2 2024-08-20 00:23:20,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4589890.0, ans=0.0 2024-08-20 00:23:40,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4589990.0, ans=0.1 2024-08-20 00:23:49,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4589990.0, ans=0.125 2024-08-20 00:24:05,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4590090.0, ans=0.125 2024-08-20 00:24:14,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4590190.0, ans=0.2 2024-08-20 00:24:32,647 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 00:24:35,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4590290.0, ans=0.125 2024-08-20 00:24:35,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4590290.0, ans=0.0 2024-08-20 00:24:36,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-20 00:24:41,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.268e+01 2.508e+01 2.742e+01 3.367e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 00:24:48,101 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14450, loss[loss=0.09277, beats_loss=0.0105, ecapa_loss=0.0001171, whisper_loss=0.0811, over 16633.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.0899, over 3771630.16 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:25:14,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4590490.0, ans=0.0 2024-08-20 00:25:44,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4590590.0, ans=0.1 2024-08-20 00:25:51,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4590690.0, ans=0.0 2024-08-20 00:25:59,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4590690.0, ans=0.125 2024-08-20 00:26:21,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4590790.0, ans=0.0 2024-08-20 00:26:24,556 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14500, loss[loss=0.09484, beats_loss=0.01024, ecapa_loss=0.0001329, whisper_loss=0.08327, over 21863.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.09021, over 3775743.99 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:26:25,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4590890.0, ans=0.125 2024-08-20 00:26:26,704 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 00:26:30,883 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 00:26:31,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4590890.0, ans=0.0 2024-08-20 00:26:54,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4590990.0, ans=0.1 2024-08-20 00:26:55,652 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 00:27:02,392 WARNING [optim.py:496] (2/4) Scaling gradients by 0.034884583204984665, model_norm_threshold=50.15473556518555 2024-08-20 00:27:02,565 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.848e+05, grad_sumsq=8.328e+07, orig_rms_sq=1.062e-02 2024-08-20 00:27:11,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4591090.0, ans=0.0 2024-08-20 00:27:14,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4591090.0, ans=0.1 2024-08-20 00:27:33,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4591190.0, ans=0.125 2024-08-20 00:27:52,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.301e+01 2.496e+01 2.802e+01 1.438e+03, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 00:27:58,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4591390.0, ans=0.05 2024-08-20 00:27:59,123 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14550, loss[loss=0.09699, beats_loss=0.009625, ecapa_loss=0.000164, whisper_loss=0.08572, over 21827.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.09015, over 3790240.44 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:28:08,677 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 00:28:11,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4591390.0, ans=0.02 2024-08-20 00:28:15,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4591390.0, ans=0.1 2024-08-20 00:28:22,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4591490.0, ans=0.1 2024-08-20 00:28:24,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4591490.0, ans=0.025 2024-08-20 00:28:59,347 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 00:29:08,539 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 00:29:24,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4591790.0, ans=0.125 2024-08-20 00:29:33,056 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14600, loss[loss=0.1019, beats_loss=0.00994, ecapa_loss=0.0001321, whisper_loss=0.09067, over 21388.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.09092, over 3825146.65 frames. ], batch size: 86, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:29:34,826 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 00:29:35,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4591890.0, ans=0.0 2024-08-20 00:29:36,578 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 00:29:48,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=4591890.0, ans=0.1 2024-08-20 00:29:52,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4591990.0, ans=0.125 2024-08-20 00:29:56,553 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 00:30:02,346 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 00:30:07,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4591990.0, ans=0.125 2024-08-20 00:30:19,846 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 00:30:30,202 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 00:30:47,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4592290.0, ans=0.1 2024-08-20 00:30:50,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2024-08-20 00:31:02,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.406e+01 2.621e+01 2.917e+01 4.385e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 00:31:06,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4592390.0, ans=0.125 2024-08-20 00:31:06,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4592390.0, ans=0.0 2024-08-20 00:31:07,359 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14650, loss[loss=0.106, beats_loss=0.01005, ecapa_loss=0.0001329, whisper_loss=0.09466, over 21982.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001413, whisper_loss=0.08989, over 3803011.75 frames. ], batch size: 87, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:31:11,418 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 00:31:11,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4592390.0, ans=0.0 2024-08-20 00:31:22,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4592390.0, ans=0.125 2024-08-20 00:31:52,802 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 00:31:52,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4592590.0, ans=0.0 2024-08-20 00:32:24,593 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 00:32:28,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4592790.0, ans=0.1 2024-08-20 00:32:41,427 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14700, loss[loss=0.0809, beats_loss=0.01291, ecapa_loss=0.0001293, whisper_loss=0.0667, over 21083.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001416, whisper_loss=0.09039, over 3802034.20 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:32:41,789 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 00:32:55,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4592890.0, ans=0.1 2024-08-20 00:33:01,866 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 00:33:02,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4592990.0, ans=0.125 2024-08-20 00:33:15,801 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 00:33:48,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2024-08-20 00:33:57,741 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 15 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 00:33:59,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2024-08-20 00:34:02,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4593290.0, ans=0.05 2024-08-20 00:34:12,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.392e+01 2.545e+01 2.884e+01 3.743e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-20 00:34:17,499 INFO [train_multi_KD3.py:1117] (2/4) Epoch 31, batch 14750, loss[loss=0.0895, beats_loss=0.01139, ecapa_loss=0.0001174, whisper_loss=0.07694, over 16991.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.09014, over 3797307.93 frames. ], batch size: 66, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:34:21,105 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:34:27,589 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 00:34:34,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=12.0 2024-08-20 00:34:39,045 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 29 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 00:34:44,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4593490.0, ans=0.125 2024-08-20 00:34:44,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4593490.0, ans=0.0 2024-08-20 00:34:44,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4593490.0, ans=0.2 2024-08-20 00:34:51,106 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 00:35:18,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4593690.0, ans=0.09899494936611666 2024-08-20 00:35:20,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4593690.0, ans=0.0 2024-08-20 00:35:22,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2024-08-20 00:36:08,659 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 0, loss[loss=0.08716, beats_loss=0.01105, ecapa_loss=0.0001018, whisper_loss=0.07509, over 14186.00 frames. ], tot_loss[loss=0.08716, beats_loss=0.01105, ecapa_loss=0.0001018, whisper_loss=0.07509, over 14186.00 frames. ], batch size: 54, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:36:08,659 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 00:36:43,350 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005131, whisper_loss=0.2488, over 931116.00 frames. 2024-08-20 00:37:05,816 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 00:38:39,971 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 00:38:39,974 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 00:38:57,747 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 00:39:03,089 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 30 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 00:39:06,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4593900.0, ans=0.1 2024-08-20 00:39:19,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4593900.0, ans=0.1 2024-08-20 00:39:34,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-20 00:39:43,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2024-08-20 00:39:51,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4594000.0, ans=0.125 2024-08-20 00:39:58,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4594100.0, ans=0.125 2024-08-20 00:40:02,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4594100.0, ans=0.025 2024-08-20 00:40:03,651 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 00:40:10,967 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 00:40:18,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-20 00:40:40,591 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 50, loss[loss=0.1055, beats_loss=0.01016, ecapa_loss=0.0001256, whisper_loss=0.0941, over 19819.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009121, ecapa_loss=0.0001425, whisper_loss=0.08952, over 850585.19 frames. ], batch size: 78, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:40:41,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4594300.0, ans=0.2 2024-08-20 00:40:55,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.478e+01 2.729e+01 3.043e+01 3.966e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-20 00:41:13,917 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-20 00:41:23,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-20 00:41:51,591 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 15 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 00:41:54,026 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 00:42:01,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4594600.0, ans=0.125 2024-08-20 00:42:14,593 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 00:42:38,545 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 100, loss[loss=0.08884, beats_loss=0.01019, ecapa_loss=0.0001357, whisper_loss=0.07729, over 22657.00 frames. ], tot_loss[loss=0.09877, beats_loss=0.00938, ecapa_loss=0.0001404, whisper_loss=0.08799, over 1503776.71 frames. ], batch size: 93, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:43:01,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4594900.0, ans=0.2 2024-08-20 00:43:35,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4595000.0, ans=0.125 2024-08-20 00:43:35,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-20 00:43:49,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4595100.0, ans=0.1 2024-08-20 00:43:57,563 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 29 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 00:44:00,564 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.938e+01 2024-08-20 00:44:05,112 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 14 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 00:44:16,215 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 00:44:19,031 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 00:44:37,715 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 150, loss[loss=0.121, beats_loss=0.007418, ecapa_loss=0.0001395, whisper_loss=0.1121, over 22789.00 frames. ], tot_loss[loss=0.09902, beats_loss=0.009425, ecapa_loss=0.0001395, whisper_loss=0.0882, over 2041580.82 frames. ], batch size: 84, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:44:45,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4595300.0, ans=0.125 2024-08-20 00:44:50,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.529e+01 2.741e+01 3.091e+01 3.915e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-20 00:45:00,291 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=12.0 2024-08-20 00:45:06,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4595400.0, ans=0.0 2024-08-20 00:45:10,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-20 00:45:29,615 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 00:45:53,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4595700.0, ans=0.125 2024-08-20 00:46:12,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-20 00:46:12,734 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 200, loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=0.0001258, whisper_loss=0.09394, over 18213.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009502, ecapa_loss=0.0001407, whisper_loss=0.08967, over 2412358.30 frames. ], batch size: 71, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:46:17,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4595800.0, ans=0.125 2024-08-20 00:46:19,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4595800.0, ans=0.07 2024-08-20 00:46:41,310 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 10 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 00:46:43,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4595900.0, ans=0.2 2024-08-20 00:46:45,632 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03673094883561134, model_norm_threshold=54.82755661010742 2024-08-20 00:46:45,851 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.205e+05, grad_sumsq=3.205e+05, orig_rms_sq=1.000e+00 2024-08-20 00:46:53,345 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 00:47:05,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4596000.0, ans=0.125 2024-08-20 00:47:05,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4596000.0, ans=0.09899494936611666 2024-08-20 00:47:09,279 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 39 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 00:47:12,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4596100.0, ans=0.0 2024-08-20 00:47:15,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4596100.0, ans=0.2 2024-08-20 00:47:15,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4596100.0, ans=0.0 2024-08-20 00:47:20,988 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 00:47:43,269 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 250, loss[loss=0.09672, beats_loss=0.01203, ecapa_loss=0.0001338, whisper_loss=0.08335, over 22945.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009689, ecapa_loss=0.0001405, whisper_loss=0.09063, over 2721685.74 frames. ], batch size: 93, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:47:53,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.311e+01 2.593e+01 2.981e+01 1.493e+03, threshold=5.185e+01, percent-clipped=1.0 2024-08-20 00:48:18,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4596500.0, ans=0.125 2024-08-20 00:48:33,984 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 00:48:41,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4596600.0, ans=0.125 2024-08-20 00:48:46,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4596600.0, ans=0.1 2024-08-20 00:49:09,725 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 300, loss[loss=0.1029, beats_loss=0.01165, ecapa_loss=0.0001496, whisper_loss=0.08971, over 22951.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.00984, ecapa_loss=0.0001402, whisper_loss=0.09109, over 2941634.75 frames. ], batch size: 93, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:49:23,062 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 00:49:25,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4596800.0, ans=0.1 2024-08-20 00:49:25,996 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03387049213051796, model_norm_threshold=51.854286193847656 2024-08-20 00:49:26,167 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.005e+05, grad_sumsq=9.116e+04, orig_rms_sq=3.297e+00 2024-08-20 00:49:34,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.56 vs. limit=6.0 2024-08-20 00:49:35,414 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 00:49:49,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4597000.0, ans=0.1 2024-08-20 00:49:57,013 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 9 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 00:50:37,089 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 350, loss[loss=0.1009, beats_loss=0.01099, ecapa_loss=0.0001665, whisper_loss=0.08828, over 21725.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009957, ecapa_loss=0.0001413, whisper_loss=0.08997, over 3104129.15 frames. ], batch size: 91, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:50:48,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.225e+01 2.468e+01 2.778e+01 1.531e+03, threshold=4.937e+01, percent-clipped=2.0 2024-08-20 00:50:53,393 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 00:51:01,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-08-20 00:51:26,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4597500.0, ans=0.1 2024-08-20 00:51:41,388 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 20 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-20 00:51:56,731 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 00:52:04,907 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 400, loss[loss=0.1013, beats_loss=0.0109, ecapa_loss=9.914e-05, whisper_loss=0.08939, over 18158.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009983, ecapa_loss=0.0001405, whisper_loss=0.0902, over 3247119.04 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:52:23,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4597900.0, ans=0.1 2024-08-20 00:52:29,454 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 00:52:36,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4597900.0, ans=0.1 2024-08-20 00:52:38,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-20 00:52:48,567 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 00:53:21,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4598200.0, ans=0.125 2024-08-20 00:53:30,296 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 00:53:35,394 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 450, loss[loss=0.1092, beats_loss=0.009249, ecapa_loss=0.0001384, whisper_loss=0.09856, over 14266.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01003, ecapa_loss=0.0001413, whisper_loss=0.08989, over 3377632.92 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:53:37,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4598300.0, ans=0.0 2024-08-20 00:53:41,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4598300.0, ans=0.125 2024-08-20 00:53:41,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4598300.0, ans=0.05 2024-08-20 00:53:45,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.300e+01 2.526e+01 2.780e+01 3.592e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-20 00:54:11,017 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 9 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 00:54:19,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4598500.0, ans=0.125 2024-08-20 00:54:51,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4598700.0, ans=0.0 2024-08-20 00:54:51,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4598700.0, ans=0.1 2024-08-20 00:54:58,552 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 14 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 00:55:01,784 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 500, loss[loss=0.07542, beats_loss=0.01235, ecapa_loss=0.0001525, whisper_loss=0.06155, over 17336.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01003, ecapa_loss=0.0001416, whisper_loss=0.09012, over 3463923.97 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:55:01,995 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 00:55:11,662 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.87 vs. limit=22.5 2024-08-20 00:55:17,791 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 00:55:23,398 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 00:55:25,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-20 00:55:55,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4599100.0, ans=0.125 2024-08-20 00:56:00,292 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 00:56:03,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4599100.0, ans=0.1 2024-08-20 00:56:10,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-08-20 00:56:18,114 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 00:56:29,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4599300.0, ans=0.125 2024-08-20 00:56:31,181 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 550, loss[loss=0.08777, beats_loss=0.01041, ecapa_loss=0.0001377, whisper_loss=0.07598, over 16059.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01005, ecapa_loss=0.0001416, whisper_loss=0.09007, over 3513985.95 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:56:41,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.287e+01 2.466e+01 2.719e+01 4.330e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-20 00:56:48,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4599400.0, ans=0.2 2024-08-20 00:56:59,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4599400.0, ans=0.2 2024-08-20 00:57:18,364 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 00:57:24,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4599600.0, ans=0.125 2024-08-20 00:57:35,883 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 15 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 00:57:42,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4599600.0, ans=0.125 2024-08-20 00:57:44,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4599700.0, ans=0.125 2024-08-20 00:57:49,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4599700.0, ans=0.125 2024-08-20 00:57:57,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4599700.0, ans=0.0 2024-08-20 00:57:59,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4599700.0, ans=0.05 2024-08-20 00:58:02,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4599800.0, ans=0.0 2024-08-20 00:58:03,838 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 600, loss[loss=0.1023, beats_loss=0.01007, ecapa_loss=0.0001242, whisper_loss=0.09097, over 17945.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01007, ecapa_loss=0.0001401, whisper_loss=0.09009, over 3543775.30 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:58:18,450 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 00:58:33,330 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 00:58:38,738 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 00:58:46,997 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 00:59:07,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4600100.0, ans=0.125 2024-08-20 00:59:24,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4600200.0, ans=0.125 2024-08-20 00:59:35,290 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 650, loss[loss=0.1292, beats_loss=0.008043, ecapa_loss=0.0001678, whisper_loss=0.1195, over 21474.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01008, ecapa_loss=0.0001401, whisper_loss=0.0898, over 3597849.43 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:59:44,619 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 18 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 00:59:46,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.232e+01 2.532e+01 2.844e+01 3.570e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-20 00:59:49,613 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-20 00:59:58,833 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 01:00:00,471 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 01:00:20,697 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07462587207555771, model_norm_threshold=50.64724349975586 2024-08-20 01:00:20,863 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.123e+04, grad_sumsq=9.123e+04, orig_rms_sq=1.000e+00 2024-08-20 01:00:30,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4600600.0, ans=0.125 2024-08-20 01:00:40,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4600600.0, ans=0.125 2024-08-20 01:00:58,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4600700.0, ans=0.125 2024-08-20 01:01:04,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4600800.0, ans=0.0 2024-08-20 01:01:05,202 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 700, loss[loss=0.1238, beats_loss=0.008593, ecapa_loss=0.0001981, whisper_loss=0.1132, over 22078.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01016, ecapa_loss=0.0001407, whisper_loss=0.09014, over 3656592.97 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:01:47,151 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 01:01:53,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4601000.0, ans=0.09899494936611666 2024-08-20 01:01:56,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4601000.0, ans=0.1 2024-08-20 01:02:22,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4601200.0, ans=0.2 2024-08-20 01:02:34,858 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 750, loss[loss=0.08935, beats_loss=0.01117, ecapa_loss=0.0001273, whisper_loss=0.0769, over 16255.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01019, ecapa_loss=0.0001387, whisper_loss=0.09028, over 3674462.43 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:02:45,505 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.348e+01 2.626e+01 2.965e+01 6.787e+02, threshold=5.252e+01, percent-clipped=3.0 2024-08-20 01:02:49,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4601300.0, ans=0.125 2024-08-20 01:03:09,645 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 35 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 01:03:22,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4601500.0, ans=0.125 2024-08-20 01:03:35,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4601600.0, ans=0.125 2024-08-20 01:03:49,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4601700.0, ans=0.0 2024-08-20 01:03:59,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4601800.0, ans=0.2 2024-08-20 01:04:00,100 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 800, loss[loss=0.1107, beats_loss=0.01046, ecapa_loss=0.0001444, whisper_loss=0.09884, over 23147.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.0001392, whisper_loss=0.08899, over 3743028.64 frames. ], batch size: 94, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:04:02,779 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 01:04:43,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4602000.0, ans=0.0 2024-08-20 01:04:45,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4602000.0, ans=0.0 2024-08-20 01:04:46,806 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 01:04:51,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4602100.0, ans=0.95 2024-08-20 01:04:58,961 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-20 01:05:11,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4602200.0, ans=0.125 2024-08-20 01:05:26,416 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 850, loss[loss=0.09524, beats_loss=0.01187, ecapa_loss=0.0001112, whisper_loss=0.08225, over 22564.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01033, ecapa_loss=0.0001386, whisper_loss=0.08917, over 3740365.72 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:05:27,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-20 01:05:29,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4602300.0, ans=0.1 2024-08-20 01:05:37,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.270e+01 2.498e+01 2.868e+01 4.208e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-20 01:05:44,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4602400.0, ans=0.2 2024-08-20 01:05:47,445 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 01:06:00,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4602500.0, ans=0.0 2024-08-20 01:06:01,645 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 25 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 01:06:22,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4602600.0, ans=0.125 2024-08-20 01:06:25,233 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 01:06:26,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4602600.0, ans=0.0 2024-08-20 01:06:48,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4602700.0, ans=0.125 2024-08-20 01:06:49,758 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.716e-03 2024-08-20 01:06:53,210 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 900, loss[loss=0.0789, beats_loss=0.01298, ecapa_loss=9.827e-05, whisper_loss=0.06494, over 13496.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.0001396, whisper_loss=0.0886, over 3734971.68 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:07:01,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-20 01:07:11,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4602900.0, ans=0.125 2024-08-20 01:07:19,910 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:07:38,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4603000.0, ans=0.0 2024-08-20 01:07:38,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2024-08-20 01:08:01,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-08-20 01:08:11,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4603200.0, ans=0.1 2024-08-20 01:08:11,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4603200.0, ans=0.125 2024-08-20 01:08:13,502 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:08:19,546 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 950, loss[loss=0.09933, beats_loss=0.01026, ecapa_loss=0.0001333, whisper_loss=0.08773, over 22459.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01033, ecapa_loss=0.0001401, whisper_loss=0.08867, over 3745728.34 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:08:31,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.373e+01 2.705e+01 3.029e+01 3.919e+02, threshold=5.410e+01, percent-clipped=3.0 2024-08-20 01:08:34,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4603300.0, ans=0.2 2024-08-20 01:09:01,381 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 01:09:08,127 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 01:09:15,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4603600.0, ans=0.125 2024-08-20 01:09:16,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4603600.0, ans=0.125 2024-08-20 01:09:35,754 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 15 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 01:09:46,131 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1000, loss[loss=0.1097, beats_loss=0.009045, ecapa_loss=0.0001592, whisper_loss=0.0991, over 18143.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01027, ecapa_loss=0.0001408, whisper_loss=0.08879, over 3737439.14 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:09:49,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-20 01:09:56,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4603800.0, ans=0.125 2024-08-20 01:10:11,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-20 01:10:12,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4603900.0, ans=0.0 2024-08-20 01:10:19,322 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 01:10:23,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4604000.0, ans=0.125 2024-08-20 01:10:50,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4604100.0, ans=0.125 2024-08-20 01:10:52,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2024-08-20 01:11:18,638 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1050, loss[loss=0.09073, beats_loss=0.009766, ecapa_loss=0.0001321, whisper_loss=0.07964, over 20415.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.08879, over 3742124.53 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:11:32,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.222e+01 2.426e+01 2.735e+01 4.130e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 01:11:34,298 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 01:11:52,671 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 32 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 01:12:00,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4604500.0, ans=0.1 2024-08-20 01:12:17,724 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 40 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 01:12:18,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4604600.0, ans=0.0 2024-08-20 01:12:19,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4604600.0, ans=0.0 2024-08-20 01:12:40,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4604700.0, ans=0.125 2024-08-20 01:12:48,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4604800.0, ans=0.0 2024-08-20 01:12:49,408 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1100, loss[loss=0.08932, beats_loss=0.01058, ecapa_loss=0.0001431, whisper_loss=0.07731, over 22741.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01028, ecapa_loss=0.0001396, whisper_loss=0.08907, over 3747631.13 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:12:49,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4604800.0, ans=0.1 2024-08-20 01:12:59,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4604800.0, ans=0.1 2024-08-20 01:12:59,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4604800.0, ans=0.125 2024-08-20 01:13:02,377 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 14 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-20 01:13:11,103 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 31 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 01:13:35,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4605000.0, ans=0.2 2024-08-20 01:13:56,966 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 01:14:15,501 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1150, loss[loss=0.1044, beats_loss=0.01016, ecapa_loss=0.0001609, whisper_loss=0.09266, over 13155.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.00014, whisper_loss=0.08914, over 3758372.77 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:14:27,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.314e+01 2.565e+01 2.766e+01 1.499e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-20 01:14:36,905 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2024-08-20 01:15:07,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4605600.0, ans=0.125 2024-08-20 01:15:10,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4605600.0, ans=0.07 2024-08-20 01:15:30,280 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 01:15:40,909 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1200, loss[loss=0.09453, beats_loss=0.01057, ecapa_loss=0.0001835, whisper_loss=0.08212, over 20711.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0103, ecapa_loss=0.0001386, whisper_loss=0.0888, over 3756886.96 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:15:55,602 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 01:16:28,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=12.0 2024-08-20 01:16:42,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4606100.0, ans=0.2 2024-08-20 01:16:57,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4606200.0, ans=0.125 2024-08-20 01:17:14,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4606300.0, ans=0.07 2024-08-20 01:17:15,237 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1250, loss[loss=0.1188, beats_loss=0.008329, ecapa_loss=0.0001462, whisper_loss=0.109, over 22697.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0103, ecapa_loss=0.0001387, whisper_loss=0.08911, over 3749104.49 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:17:17,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4606300.0, ans=0.0 2024-08-20 01:17:22,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-08-20 01:17:32,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.240e+01 2.537e+01 2.870e+01 6.660e+01, threshold=5.073e+01, percent-clipped=2.0 2024-08-20 01:17:37,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4606300.0, ans=0.1 2024-08-20 01:17:45,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4606400.0, ans=0.0 2024-08-20 01:17:51,348 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 01:18:04,609 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 01:18:07,236 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 01:18:29,309 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 28 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-20 01:18:48,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4606600.0, ans=0.125 2024-08-20 01:19:03,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4606700.0, ans=0.125 2024-08-20 01:19:08,579 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 01:19:13,290 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1300, loss[loss=0.09046, beats_loss=0.01068, ecapa_loss=0.0001336, whisper_loss=0.07845, over 13884.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01032, ecapa_loss=0.0001383, whisper_loss=0.08854, over 3748023.99 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:19:21,677 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 01:19:23,454 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 01:19:25,720 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-20 01:19:28,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4606800.0, ans=0.95 2024-08-20 01:19:33,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4606900.0, ans=0.125 2024-08-20 01:19:45,758 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 10 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 01:21:03,542 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1350, loss[loss=0.1099, beats_loss=0.01115, ecapa_loss=0.0001512, whisper_loss=0.09725, over 22601.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01035, ecapa_loss=0.0001376, whisper_loss=0.08888, over 3747009.08 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:21:21,340 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 26 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 01:21:22,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.244e+01 2.406e+01 2.687e+01 4.080e+01, threshold=4.812e+01, percent-clipped=0.0 2024-08-20 01:21:31,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4607400.0, ans=0.0 2024-08-20 01:21:31,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-20 01:21:45,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4607400.0, ans=0.125 2024-08-20 01:21:59,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4607500.0, ans=0.125 2024-08-20 01:22:06,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4607500.0, ans=0.125 2024-08-20 01:22:34,322 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 01:22:36,493 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 01:23:07,434 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1400, loss[loss=0.1059, beats_loss=0.009577, ecapa_loss=0.0001145, whisper_loss=0.0952, over 14708.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001386, whisper_loss=0.08978, over 3727289.40 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:23:09,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4607800.0, ans=0.125 2024-08-20 01:23:31,105 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 01:24:08,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4608000.0, ans=0.125 2024-08-20 01:24:11,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4608000.0, ans=0.125 2024-08-20 01:24:40,058 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-20 01:24:47,589 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 01:24:50,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4608200.0, ans=0.2 2024-08-20 01:24:56,752 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-08-20 01:25:06,542 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0315290167927742, model_norm_threshold=48.11598205566406 2024-08-20 01:25:06,704 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.963e+05, grad_sumsq=4.963e+05, orig_rms_sq=1.000e+00 2024-08-20 01:25:08,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4608300.0, ans=0.125 2024-08-20 01:25:09,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1450, loss[loss=0.09072, beats_loss=0.01252, ecapa_loss=0.00014, whisper_loss=0.07679, over 22745.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001385, whisper_loss=0.08926, over 3735723.42 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:25:13,011 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-20 01:25:26,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.461e+01 2.741e+01 1.526e+03, threshold=4.922e+01, percent-clipped=2.0 2024-08-20 01:25:30,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4608300.0, ans=0.1 2024-08-20 01:25:48,078 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 01:25:50,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4608400.0, ans=0.125 2024-08-20 01:25:56,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4608500.0, ans=0.0 2024-08-20 01:26:22,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4608600.0, ans=0.125 2024-08-20 01:26:50,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=4608600.0, ans=0.02 2024-08-20 01:27:15,954 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 14 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 01:27:25,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4608700.0, ans=0.125 2024-08-20 01:27:30,732 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2024-08-20 01:27:31,045 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1500, loss[loss=0.1204, beats_loss=0.008651, ecapa_loss=0.0001352, whisper_loss=0.1104, over 17710.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001367, whisper_loss=0.08868, over 3701199.22 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:27:42,091 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 01:27:42,396 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4608800.0, ans=0.1 2024-08-20 01:27:49,939 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 13 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 01:28:23,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4609000.0, ans=0.125 2024-08-20 01:28:23,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4609000.0, ans=0.0 2024-08-20 01:28:56,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4609200.0, ans=0.125 2024-08-20 01:28:58,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-20 01:28:59,979 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 01:29:04,037 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 01:29:08,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-20 01:29:09,830 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 01:29:13,095 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1550, loss[loss=0.0843, beats_loss=0.01216, ecapa_loss=0.000141, whisper_loss=0.07073, over 20476.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01045, ecapa_loss=0.0001366, whisper_loss=0.08813, over 3698752.83 frames. ], batch size: 82, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:29:27,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.175e+01 2.465e+01 2.675e+01 6.220e+01, threshold=4.930e+01, percent-clipped=1.0 2024-08-20 01:29:28,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4609300.0, ans=0.125 2024-08-20 01:29:45,262 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 01:29:49,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4609400.0, ans=0.0 2024-08-20 01:30:02,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-20 01:30:05,667 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 01:30:07,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4609500.0, ans=0.5 2024-08-20 01:30:11,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4609600.0, ans=0.1 2024-08-20 01:30:13,446 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 38 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 01:30:31,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4609700.0, ans=0.1 2024-08-20 01:30:36,886 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 31 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 01:30:45,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4609700.0, ans=0.125 2024-08-20 01:30:47,463 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=15.0 2024-08-20 01:30:49,735 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1600, loss[loss=0.09672, beats_loss=0.01069, ecapa_loss=0.0001343, whisper_loss=0.08469, over 15801.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01037, ecapa_loss=0.0001369, whisper_loss=0.0885, over 3675118.17 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:30:52,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4609800.0, ans=0.125 2024-08-20 01:30:59,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4609800.0, ans=0.125 2024-08-20 01:31:10,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4609900.0, ans=0.1 2024-08-20 01:31:41,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4610000.0, ans=0.0 2024-08-20 01:31:58,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4610100.0, ans=10.0 2024-08-20 01:31:58,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2024-08-20 01:32:05,598 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 01:32:24,531 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1650, loss[loss=0.1095, beats_loss=0.009813, ecapa_loss=0.0001283, whisper_loss=0.09845, over 17547.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001376, whisper_loss=0.08865, over 3717959.25 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:32:25,783 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4610300.0, ans=0.125 2024-08-20 01:32:39,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.232e+01 2.495e+01 2.715e+01 1.384e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-20 01:32:49,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4610400.0, ans=0.0 2024-08-20 01:33:25,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=15.0 2024-08-20 01:33:37,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4610600.0, ans=0.125 2024-08-20 01:33:42,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4610700.0, ans=0.0 2024-08-20 01:33:52,920 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 01:33:57,988 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1700, loss[loss=0.1158, beats_loss=0.009664, ecapa_loss=0.00013, whisper_loss=0.1048, over 23175.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01029, ecapa_loss=0.0001378, whisper_loss=0.08888, over 3734935.63 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:34:01,802 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 10 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 01:34:04,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4610800.0, ans=0.1 2024-08-20 01:34:04,244 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:34:07,378 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-20 01:34:50,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4611100.0, ans=0.0 2024-08-20 01:35:09,322 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 01:35:26,065 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1750, loss[loss=0.08963, beats_loss=0.009638, ecapa_loss=0.0001026, whisper_loss=0.07897, over 14216.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.01039, ecapa_loss=0.0001372, whisper_loss=0.08815, over 3729895.25 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:35:38,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.241e+01 2.449e+01 2.717e+01 4.269e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 01:35:43,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4611400.0, ans=0.0 2024-08-20 01:35:58,347 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-20 01:36:07,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4611500.0, ans=0.125 2024-08-20 01:36:08,834 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 01:36:09,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4611500.0, ans=0.0 2024-08-20 01:36:16,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=12.0 2024-08-20 01:36:17,401 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 01:36:21,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4611600.0, ans=0.125 2024-08-20 01:36:22,709 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 01:36:24,595 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 01:36:28,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4611600.0, ans=0.2 2024-08-20 01:36:38,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4611700.0, ans=0.0 2024-08-20 01:36:52,763 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1800, loss[loss=0.1004, beats_loss=0.01113, ecapa_loss=0.0001011, whisper_loss=0.08828, over 19083.00 frames. ], tot_loss[loss=0.09976, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.08799, over 3716366.01 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:37:11,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4611900.0, ans=0.0 2024-08-20 01:37:19,608 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 01:37:28,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4612000.0, ans=0.125 2024-08-20 01:37:36,252 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 01:37:44,953 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 01:37:46,858 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 01:37:48,627 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 01:38:02,173 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 01:38:04,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4612200.0, ans=0.09899494936611666 2024-08-20 01:38:18,888 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1850, loss[loss=0.1014, beats_loss=0.01178, ecapa_loss=0.0001156, whisper_loss=0.08845, over 22586.00 frames. ], tot_loss[loss=0.09934, beats_loss=0.01044, ecapa_loss=0.0001377, whisper_loss=0.08752, over 3725987.47 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:38:31,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.236e+01 2.438e+01 2.690e+01 3.613e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-20 01:39:08,080 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:39:10,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4612600.0, ans=0.1 2024-08-20 01:39:16,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4612600.0, ans=0.125 2024-08-20 01:39:21,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4612600.0, ans=0.125 2024-08-20 01:39:23,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4612600.0, ans=0.1 2024-08-20 01:39:43,592 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-20 01:39:47,175 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1900, loss[loss=0.09522, beats_loss=0.01319, ecapa_loss=0.0001073, whisper_loss=0.08096, over 17760.00 frames. ], tot_loss[loss=0.09981, beats_loss=0.01042, ecapa_loss=0.0001362, whisper_loss=0.08803, over 3761966.94 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:40:03,616 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-20 01:40:05,890 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.469e+05 2024-08-20 01:40:20,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-20 01:40:44,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4613100.0, ans=15.0 2024-08-20 01:40:56,123 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 16 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 01:41:02,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4613200.0, ans=0.09899494936611666 2024-08-20 01:41:05,817 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 25 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-20 01:41:14,213 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 1950, loss[loss=0.09645, beats_loss=0.007758, ecapa_loss=0.0001555, whisper_loss=0.08714, over 20401.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01033, ecapa_loss=0.0001364, whisper_loss=0.08833, over 3740145.37 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:41:26,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.348e+01 2.572e+01 2.844e+01 4.490e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-20 01:41:30,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4613400.0, ans=0.0 2024-08-20 01:41:42,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4613400.0, ans=0.0 2024-08-20 01:42:12,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-08-20 01:42:13,524 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 17 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-20 01:42:22,083 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.63 vs. limit=22.5 2024-08-20 01:42:30,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4613700.0, ans=0.0 2024-08-20 01:42:34,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4613700.0, ans=0.125 2024-08-20 01:42:38,794 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 01:42:39,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4613800.0, ans=0.05 2024-08-20 01:42:39,938 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2000, loss[loss=0.1001, beats_loss=0.01013, ecapa_loss=0.0001655, whisper_loss=0.08828, over 18742.00 frames. ], tot_loss[loss=0.09941, beats_loss=0.01039, ecapa_loss=0.0001357, whisper_loss=0.08766, over 3699414.05 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:43:01,545 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 11 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 01:43:03,136 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 01:43:04,428 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0779990628361702, model_norm_threshold=51.44282531738281 2024-08-20 01:43:04,595 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.291e+04, grad_sumsq=4.291e+04, orig_rms_sq=1.000e+00 2024-08-20 01:43:10,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4613900.0, ans=0.0 2024-08-20 01:43:22,457 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 01:43:24,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4614000.0, ans=0.125 2024-08-20 01:44:07,476 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2050, loss[loss=0.09261, beats_loss=0.01238, ecapa_loss=0.0001369, whisper_loss=0.07886, over 22562.00 frames. ], tot_loss[loss=0.09963, beats_loss=0.0104, ecapa_loss=0.0001364, whisper_loss=0.08786, over 3706449.76 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:44:19,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.219e+01 2.452e+01 2.809e+01 6.595e+02, threshold=4.904e+01, percent-clipped=1.0 2024-08-20 01:44:46,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4614500.0, ans=0.1 2024-08-20 01:44:46,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4614500.0, ans=0.1 2024-08-20 01:45:00,807 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 37 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 01:45:03,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4614600.0, ans=0.1 2024-08-20 01:45:14,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-20 01:45:19,929 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 01:45:33,353 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2100, loss[loss=0.1058, beats_loss=0.01043, ecapa_loss=0.0001428, whisper_loss=0.09395, over 17902.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01042, ecapa_loss=0.000136, whisper_loss=0.08824, over 3719616.65 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:45:42,778 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-20 01:45:45,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2024-08-20 01:46:02,846 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 01:46:22,372 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 01:46:53,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4615200.0, ans=0.0 2024-08-20 01:46:55,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-20 01:46:56,632 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 01:46:58,092 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-08-20 01:47:00,094 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2150, loss[loss=0.1147, beats_loss=0.008674, ecapa_loss=0.0001514, whisper_loss=0.1045, over 15976.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01046, ecapa_loss=0.0001349, whisper_loss=0.08812, over 3723062.50 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:47:00,305 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 01:47:04,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4615300.0, ans=0.1 2024-08-20 01:47:12,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.213e+01 2.411e+01 2.746e+01 4.203e+01, threshold=4.821e+01, percent-clipped=0.0 2024-08-20 01:47:14,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4615300.0, ans=0.125 2024-08-20 01:47:24,188 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 01:47:24,976 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=12.0 2024-08-20 01:47:41,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4615500.0, ans=0.05 2024-08-20 01:47:52,888 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 01:48:08,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4615700.0, ans=0.125 2024-08-20 01:48:25,547 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2200, loss[loss=0.112, beats_loss=0.009532, ecapa_loss=0.0001554, whisper_loss=0.1009, over 21430.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01042, ecapa_loss=0.0001356, whisper_loss=0.08817, over 3697536.53 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:48:29,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4615800.0, ans=0.0 2024-08-20 01:48:39,477 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-20 01:49:03,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.42 vs. limit=22.5 2024-08-20 01:49:12,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4616000.0, ans=15.0 2024-08-20 01:49:21,501 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 18 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 01:49:35,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4616200.0, ans=0.125 2024-08-20 01:49:39,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4616200.0, ans=0.0 2024-08-20 01:49:39,724 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.282e+01 2024-08-20 01:49:50,342 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2250, loss[loss=0.0982, beats_loss=0.01173, ecapa_loss=0.0001663, whisper_loss=0.0848, over 18298.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01048, ecapa_loss=0.0001348, whisper_loss=0.08821, over 3707440.30 frames. ], batch size: 79, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:50:01,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4616300.0, ans=0.1 2024-08-20 01:50:02,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.187e+01 2.427e+01 2.680e+01 3.409e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 01:50:05,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2024-08-20 01:50:15,309 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 01:50:26,981 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 01:50:39,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4616500.0, ans=0.2 2024-08-20 01:50:43,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4616600.0, ans=0.125 2024-08-20 01:50:51,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-20 01:50:52,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2024-08-20 01:51:15,685 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2300, loss[loss=0.08589, beats_loss=0.009724, ecapa_loss=0.0002032, whisper_loss=0.07413, over 18372.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001359, whisper_loss=0.08897, over 3712045.30 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:51:52,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4617000.0, ans=0.2 2024-08-20 01:52:37,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4617200.0, ans=0.125 2024-08-20 01:52:39,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4617200.0, ans=0.125 2024-08-20 01:52:42,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4617300.0, ans=0.1 2024-08-20 01:52:43,083 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2350, loss[loss=0.1184, beats_loss=0.009314, ecapa_loss=0.0001274, whisper_loss=0.1078, over 14350.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01046, ecapa_loss=0.0001365, whisper_loss=0.08888, over 3733808.98 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:52:43,357 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 01:52:49,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4617300.0, ans=0.95 2024-08-20 01:52:55,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.315e+01 2.598e+01 2.990e+01 3.797e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-20 01:53:04,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4617400.0, ans=0.125 2024-08-20 01:53:11,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4617400.0, ans=0.0 2024-08-20 01:53:14,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4617400.0, ans=0.0 2024-08-20 01:53:36,854 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 33 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 01:53:49,437 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2024-08-20 01:53:52,774 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-20 01:53:59,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4617700.0, ans=0.125 2024-08-20 01:54:07,164 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2400, loss[loss=0.1104, beats_loss=0.01013, ecapa_loss=0.0001362, whisper_loss=0.09887, over 14039.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001375, whisper_loss=0.08901, over 3743484.34 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:54:08,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-08-20 01:54:22,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4617800.0, ans=0.1 2024-08-20 01:54:41,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2024-08-20 01:54:47,879 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4618000.0, ans=0.0 2024-08-20 01:54:59,010 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 01:55:04,586 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 01:55:15,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4618200.0, ans=0.2 2024-08-20 01:55:16,453 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 01:55:18,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4618200.0, ans=0.1 2024-08-20 01:55:18,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4618200.0, ans=0.1 2024-08-20 01:55:33,036 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2450, loss[loss=0.08042, beats_loss=0.01326, ecapa_loss=0.0001342, whisper_loss=0.06582, over 18003.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001371, whisper_loss=0.08881, over 3746055.32 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:55:35,180 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 01:55:41,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-08-20 01:55:45,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.204e+01 2.412e+01 2.711e+01 4.337e+02, threshold=4.825e+01, percent-clipped=1.0 2024-08-20 01:55:50,348 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 22 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 01:56:07,998 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-20 01:57:03,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-20 01:57:03,895 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2500, loss[loss=0.09882, beats_loss=0.008642, ecapa_loss=0.0001148, whisper_loss=0.08903, over 18113.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.08906, over 3741617.32 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:57:09,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-08-20 01:57:18,420 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 01:58:00,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2024-08-20 01:58:15,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4619200.0, ans=0.125 2024-08-20 01:58:22,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4619200.0, ans=0.125 2024-08-20 01:58:26,774 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 01:58:32,178 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2550, loss[loss=0.09429, beats_loss=0.01046, ecapa_loss=0.0001715, whisper_loss=0.08211, over 16701.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.000137, whisper_loss=0.08917, over 3760576.78 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:58:44,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2024-08-20 01:58:44,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.306e+01 2.523e+01 2.847e+01 3.512e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-20 01:58:45,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4619300.0, ans=0.125 2024-08-20 01:59:24,232 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2024-08-20 01:59:29,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4619600.0, ans=0.125 2024-08-20 01:59:30,236 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 01:59:40,581 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 01:59:46,729 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:59:52,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4619700.0, ans=0.2 2024-08-20 01:59:55,180 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 02:00:01,014 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2600, loss[loss=0.08371, beats_loss=0.01226, ecapa_loss=0.0001227, whisper_loss=0.07022, over 18884.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.08917, over 3766867.26 frames. ], batch size: 78, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:00:04,425 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 02:00:06,021 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 02:00:12,750 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 02:00:20,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4619900.0, ans=0.0 2024-08-20 02:00:26,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4619900.0, ans=0.125 2024-08-20 02:00:43,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4620000.0, ans=0.0 2024-08-20 02:00:45,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4620000.0, ans=0.125 2024-08-20 02:00:59,642 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 02:01:11,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4620200.0, ans=0.125 2024-08-20 02:01:17,678 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 22 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-20 02:01:30,014 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2650, loss[loss=0.1009, beats_loss=0.00948, ecapa_loss=0.000172, whisper_loss=0.0897, over 21041.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001369, whisper_loss=0.08937, over 3749247.54 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:01:31,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4620300.0, ans=0.125 2024-08-20 02:01:42,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.354e+01 2.571e+01 2.953e+01 6.961e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-20 02:01:55,044 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 02:01:57,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4620400.0, ans=0.125 2024-08-20 02:02:02,151 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 02:02:04,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-20 02:02:06,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-20 02:02:13,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4620500.0, ans=0.125 2024-08-20 02:02:17,450 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 02:02:18,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4620500.0, ans=0.07 2024-08-20 02:02:27,628 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 02:02:35,326 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 02:02:49,252 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 20 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-20 02:02:58,542 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2700, loss[loss=0.08546, beats_loss=0.0136, ecapa_loss=9.98e-05, whisper_loss=0.07086, over 15917.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.08907, over 3770799.55 frames. ], batch size: 62, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:03:08,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4620800.0, ans=0.0 2024-08-20 02:03:15,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4620900.0, ans=0.2 2024-08-20 02:03:16,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4620900.0, ans=0.0 2024-08-20 02:03:49,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4621100.0, ans=0.125 2024-08-20 02:03:54,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4621100.0, ans=0.1 2024-08-20 02:04:03,768 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.751e-02 2024-08-20 02:04:12,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4621200.0, ans=0.0 2024-08-20 02:04:15,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4621200.0, ans=0.1 2024-08-20 02:04:17,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-20 02:04:24,792 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2750, loss[loss=0.1036, beats_loss=0.008015, ecapa_loss=0.0001601, whisper_loss=0.09399, over 17095.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0104, ecapa_loss=0.0001379, whisper_loss=0.08852, over 3733926.22 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:04:27,032 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 02:04:32,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4621300.0, ans=0.1 2024-08-20 02:04:36,896 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.512e+01 2.707e+01 3.446e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 02:04:43,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4621400.0, ans=0.0 2024-08-20 02:04:54,625 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.648e-01 2024-08-20 02:05:07,625 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 02:05:30,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4621600.0, ans=0.0 2024-08-20 02:05:46,922 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:05:53,029 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2800, loss[loss=0.09266, beats_loss=0.01133, ecapa_loss=0.0001231, whisper_loss=0.08011, over 17507.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001374, whisper_loss=0.08886, over 3733241.84 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:05:53,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4621800.0, ans=0.2 2024-08-20 02:06:25,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4621900.0, ans=0.125 2024-08-20 02:06:45,824 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 02:07:12,650 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-20 02:07:22,913 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2850, loss[loss=0.1126, beats_loss=0.009454, ecapa_loss=0.0001533, whisper_loss=0.1017, over 14615.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001379, whisper_loss=0.08917, over 3732315.60 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:07:25,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4622300.0, ans=0.125 2024-08-20 02:07:27,106 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 02:07:34,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4622300.0, ans=0.125 2024-08-20 02:07:35,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.251e+01 2.480e+01 2.760e+01 4.318e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-20 02:07:37,344 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 02:08:07,856 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4622500.0, ans=0.0 2024-08-20 02:08:21,944 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 02:08:50,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4622700.0, ans=0.09899494936611666 2024-08-20 02:08:52,966 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2900, loss[loss=0.09096, beats_loss=0.01223, ecapa_loss=0.0001536, whisper_loss=0.0772, over 21430.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01025, ecapa_loss=0.0001385, whisper_loss=0.08961, over 3701999.81 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:08:55,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4622800.0, ans=0.125 2024-08-20 02:08:58,385 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 13 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 02:09:03,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4622800.0, ans=0.125 2024-08-20 02:09:04,224 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 02:09:10,776 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 29 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 02:09:40,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4623000.0, ans=0.2 2024-08-20 02:10:15,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4623200.0, ans=0.5 2024-08-20 02:10:20,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4623300.0, ans=0.125 2024-08-20 02:10:22,384 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 2950, loss[loss=0.1037, beats_loss=0.009522, ecapa_loss=0.0001339, whisper_loss=0.09288, over 20961.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001391, whisper_loss=0.0893, over 3752445.01 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:10:22,582 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 02:10:30,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623300.0, ans=0.1 2024-08-20 02:10:34,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.285e+01 2.491e+01 2.729e+01 3.693e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 02:10:45,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4623400.0, ans=0.125 2024-08-20 02:10:50,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4623400.0, ans=0.0 2024-08-20 02:11:04,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4623500.0, ans=0.125 2024-08-20 02:11:06,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4623500.0, ans=0.05 2024-08-20 02:11:06,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4623500.0, ans=0.125 2024-08-20 02:11:08,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-08-20 02:11:09,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4623500.0, ans=0.1 2024-08-20 02:11:11,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4623500.0, ans=0.0 2024-08-20 02:11:15,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4623600.0, ans=0.125 2024-08-20 02:11:25,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4623600.0, ans=0.04949747468305833 2024-08-20 02:11:33,928 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 02:11:36,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4623700.0, ans=0.025 2024-08-20 02:11:42,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4623700.0, ans=0.125 2024-08-20 02:11:44,102 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 02:11:48,925 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3000, loss[loss=0.1228, beats_loss=0.0112, ecapa_loss=0.0001398, whisper_loss=0.1102, over 23535.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001404, whisper_loss=0.08969, over 3752637.41 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:11:48,926 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 02:12:25,535 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.000511, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 02:12:46,532 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on SV_voxceleb1: loss=0.003941, beats_loss=0, ecapa_loss=0.0003941, whisper_loss=0, over 944235.00 frames. 2024-08-20 02:13:14,633 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9317, 2.1980, 2.3051, 2.1524], device='cuda:2') 2024-08-20 02:14:20,928 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on AT_audioset: loss=0.02293, beats_loss=0.02293, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 02:14:20,932 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 02:14:23,506 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 12 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 02:15:07,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4624000.0, ans=0.1 2024-08-20 02:15:09,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4624100.0, ans=0.1 2024-08-20 02:15:21,066 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 02:15:44,440 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3050, loss[loss=0.1185, beats_loss=0.009244, ecapa_loss=0.0001445, whisper_loss=0.1078, over 20561.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.0905, over 3804569.48 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:15:53,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.28 vs. limit=22.5 2024-08-20 02:15:56,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.348e+01 2.639e+01 2.982e+01 8.249e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-20 02:16:06,477 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 02:16:07,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-20 02:16:17,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4624500.0, ans=0.1 2024-08-20 02:16:20,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4624500.0, ans=0.1 2024-08-20 02:16:28,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4624500.0, ans=0.125 2024-08-20 02:17:08,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4624800.0, ans=0.0 2024-08-20 02:17:09,618 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3100, loss[loss=0.1033, beats_loss=0.009467, ecapa_loss=0.0001619, whisper_loss=0.09217, over 22821.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09133, over 3832082.99 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:17:18,099 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 02:17:23,053 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 19 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 02:17:24,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4624800.0, ans=0.125 2024-08-20 02:17:30,955 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.423e+00 2024-08-20 02:17:47,563 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 02:17:54,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4625000.0, ans=0.125 2024-08-20 02:17:59,435 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.081e-03 2024-08-20 02:18:16,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4625200.0, ans=0.0 2024-08-20 02:18:19,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4625200.0, ans=0.0 2024-08-20 02:18:33,641 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3150, loss[loss=0.1208, beats_loss=0.009529, ecapa_loss=0.0001475, whisper_loss=0.1098, over 15621.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01031, ecapa_loss=0.0001409, whisper_loss=0.09153, over 3799671.44 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:18:36,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4625300.0, ans=0.1 2024-08-20 02:18:44,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-20 02:18:44,616 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.262e+01 2.448e+01 2.716e+01 4.425e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-20 02:18:54,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-20 02:19:09,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4625500.0, ans=0.2 2024-08-20 02:19:24,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:31,668 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 02:19:32,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:40,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4625700.0, ans=0.2 2024-08-20 02:19:44,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4625700.0, ans=0.125 2024-08-20 02:19:44,226 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.420e+00 2024-08-20 02:19:56,717 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3200, loss[loss=0.09726, beats_loss=0.01234, ecapa_loss=0.0001644, whisper_loss=0.08328, over 16903.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.09104, over 3798378.89 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:20:09,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4625800.0, ans=0.2 2024-08-20 02:20:35,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4626000.0, ans=0.2 2024-08-20 02:21:03,430 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-08-20 02:21:10,514 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 02:21:14,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4626200.0, ans=0.1 2024-08-20 02:21:20,039 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3250, loss[loss=0.1009, beats_loss=0.007184, ecapa_loss=0.0001896, whisper_loss=0.09184, over 15754.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09126, over 3830441.08 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:21:32,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.285e+01 2.517e+01 2.834e+01 4.980e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-20 02:21:35,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4626300.0, ans=0.125 2024-08-20 02:21:38,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4626400.0, ans=0.2 2024-08-20 02:21:50,565 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 02:21:56,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4626500.0, ans=0.0 2024-08-20 02:22:25,220 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 17 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 02:22:38,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.86 vs. limit=22.5 2024-08-20 02:22:46,851 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3300, loss[loss=0.09504, beats_loss=0.009531, ecapa_loss=0.0001705, whisper_loss=0.0838, over 15608.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09086, over 3795008.37 frames. ], batch size: 65, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:23:14,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4626900.0, ans=0.125 2024-08-20 02:23:27,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.75 vs. limit=10.0 2024-08-20 02:23:28,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4627000.0, ans=0.125 2024-08-20 02:23:39,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4627100.0, ans=0.2 2024-08-20 02:23:51,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4627200.0, ans=0.125 2024-08-20 02:24:03,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4627200.0, ans=0.125 2024-08-20 02:24:07,635 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 02:24:08,864 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3350, loss[loss=0.1054, beats_loss=0.009188, ecapa_loss=0.0001683, whisper_loss=0.09451, over 18776.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001417, whisper_loss=0.09098, over 3800287.86 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:24:12,952 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 02:24:13,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4627300.0, ans=0.0 2024-08-20 02:24:13,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4627300.0, ans=0.0 2024-08-20 02:24:20,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.203e+01 2.406e+01 2.784e+01 4.307e+01, threshold=4.813e+01, percent-clipped=0.0 2024-08-20 02:24:37,447 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 02:24:48,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4627500.0, ans=0.125 2024-08-20 02:24:52,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-20 02:24:52,742 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 02:25:16,027 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 18 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 02:25:20,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4627700.0, ans=0.05 2024-08-20 02:25:32,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3400, loss[loss=0.1029, beats_loss=0.0102, ecapa_loss=0.0001632, whisper_loss=0.09104, over 18162.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001416, whisper_loss=0.09082, over 3829601.81 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:26:17,045 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 02:26:18,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4628000.0, ans=0.09899494936611666 2024-08-20 02:26:38,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4628200.0, ans=0.5 2024-08-20 02:26:39,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4628200.0, ans=0.035 2024-08-20 02:26:55,321 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3450, loss[loss=0.1097, beats_loss=0.009038, ecapa_loss=0.0001557, whisper_loss=0.09909, over 22622.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001404, whisper_loss=0.0912, over 3810483.84 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:27:01,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4628300.0, ans=0.0 2024-08-20 02:27:06,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4628300.0, ans=10.0 2024-08-20 02:27:07,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.276e+01 2.600e+01 2.959e+01 4.699e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-20 02:27:23,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4628400.0, ans=0.1 2024-08-20 02:27:23,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4628400.0, ans=0.2 2024-08-20 02:27:25,013 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 18 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 02:27:36,791 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 02:27:42,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4628500.0, ans=0.125 2024-08-20 02:28:04,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4628700.0, ans=0.0 2024-08-20 02:28:19,445 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3500, loss[loss=0.09829, beats_loss=0.009449, ecapa_loss=0.0001016, whisper_loss=0.08783, over 14239.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.0908, over 3805452.91 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:28:26,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4628800.0, ans=0.07 2024-08-20 02:28:32,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4628800.0, ans=0.125 2024-08-20 02:28:46,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4628900.0, ans=0.1 2024-08-20 02:28:52,718 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 02:28:52,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4629000.0, ans=10.0 2024-08-20 02:29:04,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4629000.0, ans=0.125 2024-08-20 02:29:04,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4629000.0, ans=0.0 2024-08-20 02:29:06,390 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 02:29:11,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4629100.0, ans=0.125 2024-08-20 02:29:34,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4629200.0, ans=0.0 2024-08-20 02:29:38,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-20 02:29:44,390 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3550, loss[loss=0.08036, beats_loss=0.01004, ecapa_loss=0.0001405, whisper_loss=0.06892, over 14759.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.09031, over 3785872.28 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:29:56,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.386e+01 2.605e+01 2.983e+01 3.766e+02, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 02:29:59,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4629300.0, ans=0.0 2024-08-20 02:30:18,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4629500.0, ans=0.125 2024-08-20 02:30:30,122 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 02:30:47,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4629600.0, ans=0.2 2024-08-20 02:30:49,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4629600.0, ans=0.125 2024-08-20 02:31:04,891 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 02:31:13,841 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 02:31:24,048 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3600, loss[loss=0.09176, beats_loss=0.0105, ecapa_loss=0.0001866, whisper_loss=0.0794, over 22055.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.09033, over 3817626.37 frames. ], batch size: 94, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:31:25,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4629800.0, ans=0.125 2024-08-20 02:31:48,092 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.480e-01 2024-08-20 02:31:56,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4629900.0, ans=0.07 2024-08-20 02:32:25,992 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 02:32:28,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4630000.0, ans=0.1 2024-08-20 02:32:46,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4630100.0, ans=0.0 2024-08-20 02:32:48,043 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 02:32:53,350 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:33:12,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4630200.0, ans=0.125 2024-08-20 02:33:14,304 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 02:33:15,275 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3650, loss[loss=0.09687, beats_loss=0.01275, ecapa_loss=0.0001064, whisper_loss=0.08306, over 18714.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.08979, over 3795487.10 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:33:21,617 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 02:33:29,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.219e+01 2.439e+01 2.661e+01 4.108e+01, threshold=4.879e+01, percent-clipped=0.0 2024-08-20 02:33:57,135 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 02:34:01,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4630500.0, ans=0.125 2024-08-20 02:34:01,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4630500.0, ans=0.1 2024-08-20 02:34:05,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4630500.0, ans=0.125 2024-08-20 02:34:18,619 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-20 02:34:38,902 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 02:35:06,825 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3700, loss[loss=0.1042, beats_loss=0.01199, ecapa_loss=0.0001423, whisper_loss=0.09075, over 14984.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001406, whisper_loss=0.08952, over 3784939.71 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:35:08,631 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 02:35:17,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4630800.0, ans=0.125 2024-08-20 02:35:26,002 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 02:35:28,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4630900.0, ans=0.1 2024-08-20 02:35:31,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4630900.0, ans=0.0 2024-08-20 02:35:57,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-20 02:35:59,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4631000.0, ans=0.125 2024-08-20 02:36:23,982 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 02:36:26,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2024-08-20 02:36:49,848 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 02:36:54,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4631200.0, ans=0.0 2024-08-20 02:36:55,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4631200.0, ans=0.125 2024-08-20 02:36:58,529 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3750, loss[loss=0.1178, beats_loss=0.009724, ecapa_loss=0.0001319, whisper_loss=0.1067, over 23336.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.08966, over 3785686.22 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:37:12,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4631300.0, ans=0.0 2024-08-20 02:37:13,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.276e+01 2.480e+01 2.901e+01 4.929e+01, threshold=4.959e+01, percent-clipped=1.0 2024-08-20 02:37:14,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4631300.0, ans=0.125 2024-08-20 02:37:16,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4631300.0, ans=0.015 2024-08-20 02:37:16,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4631300.0, ans=0.125 2024-08-20 02:37:22,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4631400.0, ans=0.015 2024-08-20 02:37:22,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4631400.0, ans=0.0 2024-08-20 02:37:35,377 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 10 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 02:37:46,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4631500.0, ans=0.1 2024-08-20 02:37:49,289 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 02:37:55,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4631500.0, ans=0.1 2024-08-20 02:37:57,202 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 02:37:57,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4631500.0, ans=0.0 2024-08-20 02:37:57,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4631500.0, ans=6.0 2024-08-20 02:37:59,132 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 7 from Vox, 37 fro AS 2024-08-20 02:38:01,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4631600.0, ans=0.0 2024-08-20 02:38:47,586 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3800, loss[loss=0.104, beats_loss=0.01208, ecapa_loss=0.0001493, whisper_loss=0.09039, over 13644.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.09078, over 3791471.07 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:39:27,662 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-20 02:39:29,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4631900.0, ans=0.125 2024-08-20 02:39:48,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4632000.0, ans=0.04949747468305833 2024-08-20 02:40:15,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-20 02:40:36,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4632200.0, ans=0.0 2024-08-20 02:40:40,663 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3850, loss[loss=0.09838, beats_loss=0.01073, ecapa_loss=0.0001592, whisper_loss=0.08606, over 19381.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.09089, over 3811668.06 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:40:55,854 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.447e+01 2.712e+01 3.131e+01 3.132e+02, threshold=5.425e+01, percent-clipped=6.0 2024-08-20 02:41:15,326 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2024-08-20 02:41:26,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4632500.0, ans=0.125 2024-08-20 02:41:39,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4632500.0, ans=0.0 2024-08-20 02:41:45,523 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 02:42:13,702 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 02:42:22,120 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 02:42:26,920 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3900, loss[loss=0.08604, beats_loss=0.01282, ecapa_loss=0.0001343, whisper_loss=0.07187, over 17745.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.09089, over 3845959.52 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:42:28,254 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 02:42:44,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4632800.0, ans=0.5 2024-08-20 02:42:50,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4632900.0, ans=0.09899494936611666 2024-08-20 02:43:04,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4632900.0, ans=0.2 2024-08-20 02:43:34,435 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 02:43:49,893 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 02:44:17,845 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 3950, loss[loss=0.1098, beats_loss=0.008691, ecapa_loss=0.0001579, whisper_loss=0.09952, over 21184.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.09064, over 3842065.43 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:44:23,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4633300.0, ans=0.125 2024-08-20 02:44:25,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4633300.0, ans=0.125 2024-08-20 02:44:33,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.316e+01 2.520e+01 2.771e+01 2.265e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 02:44:59,405 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 32 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 02:45:10,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4633500.0, ans=0.125 2024-08-20 02:45:12,463 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 02:45:48,851 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 02:46:06,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4633700.0, ans=0.1 2024-08-20 02:46:09,264 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4000, loss[loss=0.1232, beats_loss=0.00866, ecapa_loss=0.0001679, whisper_loss=0.1129, over 23091.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09129, over 3864696.64 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:46:10,896 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2024-08-20 02:46:15,259 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.172e-01 2024-08-20 02:46:19,437 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 02:47:00,093 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 02:47:02,244 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-20 02:47:08,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2024-08-20 02:47:29,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4634100.0, ans=0.0 2024-08-20 02:47:55,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-20 02:48:00,251 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 02:48:05,836 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4050, loss[loss=0.08013, beats_loss=0.01266, ecapa_loss=0.0001443, whisper_loss=0.06602, over 21310.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.09109, over 3849936.39 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:48:17,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4634300.0, ans=0.125 2024-08-20 02:48:22,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.303e+01 2.496e+01 2.881e+01 4.421e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-20 02:48:51,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4634400.0, ans=0.0 2024-08-20 02:49:20,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4634600.0, ans=0.0 2024-08-20 02:49:26,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-08-20 02:49:49,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4634700.0, ans=0.0 2024-08-20 02:50:06,618 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4100, loss[loss=0.1007, beats_loss=0.009931, ecapa_loss=0.0001259, whisper_loss=0.08946, over 15412.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.09124, over 3890641.60 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:50:12,106 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 16 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 02:50:20,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4634800.0, ans=0.09899494936611666 2024-08-20 02:50:38,224 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 02:50:38,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4634900.0, ans=0.125 2024-08-20 02:50:50,016 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 02:51:24,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4635100.0, ans=0.1 2024-08-20 02:51:34,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4635100.0, ans=0.125 2024-08-20 02:51:34,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.57 vs. limit=10.0 2024-08-20 02:51:39,917 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 02:51:55,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4635200.0, ans=0.1 2024-08-20 02:52:01,060 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4150, loss[loss=0.0771, beats_loss=0.0104, ecapa_loss=0.000151, whisper_loss=0.0652, over 15582.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.09056, over 3892681.57 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:52:04,746 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 32 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 02:52:06,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4635300.0, ans=0.1 2024-08-20 02:52:11,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4635300.0, ans=0.0 2024-08-20 02:52:15,142 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 02:52:16,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.376e+01 2.677e+01 2.991e+01 4.680e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-20 02:52:20,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2024-08-20 02:52:28,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4635400.0, ans=0.125 2024-08-20 02:52:39,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2024-08-20 02:52:46,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4635500.0, ans=0.125 2024-08-20 02:53:15,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2024-08-20 02:53:18,109 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 02:53:23,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4635600.0, ans=0.2 2024-08-20 02:53:43,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4635700.0, ans=0.1 2024-08-20 02:53:52,210 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4200, loss[loss=0.1032, beats_loss=0.009221, ecapa_loss=0.00014, whisper_loss=0.09255, over 16802.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.09074, over 3861141.66 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:53:54,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-08-20 02:53:55,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4635800.0, ans=0.125 2024-08-20 02:54:18,476 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-20 02:54:47,391 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-20 02:54:52,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4636000.0, ans=0.125 2024-08-20 02:55:11,676 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 02:55:48,360 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4250, loss[loss=0.1084, beats_loss=0.01107, ecapa_loss=0.0001549, whisper_loss=0.09578, over 19323.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001398, whisper_loss=0.09025, over 3852573.21 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:56:06,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.193e+01 2.444e+01 2.797e+01 4.359e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-20 02:56:15,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4636400.0, ans=0.125 2024-08-20 02:56:37,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4636500.0, ans=0.1 2024-08-20 02:56:57,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4636500.0, ans=0.125 2024-08-20 02:57:03,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4636600.0, ans=0.125 2024-08-20 02:57:06,298 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 02:57:33,152 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-20 02:57:39,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4636700.0, ans=0.125 2024-08-20 02:57:45,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4636700.0, ans=0.125 2024-08-20 02:57:48,311 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4300, loss[loss=0.08939, beats_loss=0.01384, ecapa_loss=0.0001046, whisper_loss=0.0745, over 20155.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01061, ecapa_loss=0.0001413, whisper_loss=0.08941, over 3800894.01 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:57:50,223 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-20 02:57:57,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4636800.0, ans=0.0 2024-08-20 02:58:13,552 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 02:58:21,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4636900.0, ans=0.125 2024-08-20 02:58:26,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4636900.0, ans=0.0 2024-08-20 02:58:26,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4636900.0, ans=0.125 2024-08-20 02:59:06,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4637100.0, ans=0.1 2024-08-20 02:59:06,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4637100.0, ans=0.125 2024-08-20 02:59:24,108 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 02:59:32,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2024-08-20 02:59:43,210 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 10 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 02:59:52,019 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4350, loss[loss=0.1245, beats_loss=0.006653, ecapa_loss=0.0001583, whisper_loss=0.1162, over 17409.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001401, whisper_loss=0.0898, over 3826822.87 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:00:06,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4637300.0, ans=0.025 2024-08-20 03:00:08,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.299e+01 2.481e+01 2.858e+01 4.859e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 03:00:13,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4637300.0, ans=0.125 2024-08-20 03:00:16,249 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 28 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 03:00:23,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4637400.0, ans=0.125 2024-08-20 03:00:52,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4637500.0, ans=0.125 2024-08-20 03:01:20,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4637600.0, ans=0.2 2024-08-20 03:01:20,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-08-20 03:01:41,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=12.0 2024-08-20 03:01:53,496 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4400, loss[loss=0.1076, beats_loss=0.008474, ecapa_loss=0.0001617, whisper_loss=0.09755, over 16333.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.00014, whisper_loss=0.08956, over 3793873.04 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:02:03,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-20 03:02:18,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4637900.0, ans=0.2 2024-08-20 03:02:20,258 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 03:02:20,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4637900.0, ans=0.0 2024-08-20 03:02:20,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-08-20 03:02:29,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4637900.0, ans=0.125 2024-08-20 03:02:29,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4637900.0, ans=0.0 2024-08-20 03:02:54,661 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 13 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 03:03:08,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4638100.0, ans=0.0 2024-08-20 03:03:08,952 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:03:09,077 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-20 03:03:21,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-20 03:03:24,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-08-20 03:03:50,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4638200.0, ans=0.0 2024-08-20 03:03:56,303 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4450, loss[loss=0.09791, beats_loss=0.0107, ecapa_loss=0.0001243, whisper_loss=0.08597, over 17207.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001404, whisper_loss=0.08898, over 3793541.73 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:04:12,155 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 30 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 03:04:12,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.659e+01 2.158e+01 2.452e+01 2.719e+01 3.768e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-20 03:04:17,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4638300.0, ans=0.1 2024-08-20 03:04:32,218 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:04:36,892 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-20 03:05:22,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4638600.0, ans=0.125 2024-08-20 03:05:25,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4638600.0, ans=0.0 2024-08-20 03:05:33,045 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2024-08-20 03:05:39,604 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 03:05:44,762 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 03:05:51,037 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 03:06:00,013 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4500, loss[loss=0.1166, beats_loss=0.009551, ecapa_loss=0.0001659, whisper_loss=0.1054, over 21361.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001404, whisper_loss=0.08902, over 3767487.73 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:06:48,778 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 03:07:20,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4639100.0, ans=0.1 2024-08-20 03:07:30,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4639100.0, ans=0.125 2024-08-20 03:08:04,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4639300.0, ans=0.125 2024-08-20 03:08:05,423 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4550, loss[loss=0.09967, beats_loss=0.007973, ecapa_loss=0.0001449, whisper_loss=0.09025, over 15023.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001401, whisper_loss=0.08976, over 3799041.07 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:08:23,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.329e+01 2.605e+01 2.856e+01 5.309e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 03:08:34,776 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 03:08:43,890 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 03:08:59,199 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 03:09:06,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4639500.0, ans=0.0 2024-08-20 03:09:16,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4639500.0, ans=0.125 2024-08-20 03:09:26,760 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 03:09:32,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4639600.0, ans=0.0 2024-08-20 03:09:32,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2024-08-20 03:10:13,199 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4600, loss[loss=0.1266, beats_loss=0.007687, ecapa_loss=0.0001623, whisper_loss=0.1173, over 22107.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.09041, over 3812333.13 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:10:14,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4639800.0, ans=0.5 2024-08-20 03:10:27,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-20 03:10:58,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4639900.0, ans=0.0 2024-08-20 03:11:52,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4640100.0, ans=0.125 2024-08-20 03:11:54,813 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 03:12:03,171 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-20 03:12:08,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4640200.0, ans=22.5 2024-08-20 03:12:24,828 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4650, loss[loss=0.0769, beats_loss=0.01058, ecapa_loss=0.0001546, whisper_loss=0.06477, over 22188.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001402, whisper_loss=0.08942, over 3839362.11 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:12:27,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2024-08-20 03:12:41,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.331e+01 2.446e+01 2.750e+01 3.848e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 03:12:53,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4640400.0, ans=0.125 2024-08-20 03:12:55,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4640400.0, ans=0.1 2024-08-20 03:13:00,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4640400.0, ans=0.0 2024-08-20 03:13:10,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-20 03:13:12,094 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 03:13:15,028 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 03:13:19,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4640500.0, ans=0.0 2024-08-20 03:13:22,739 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-08-20 03:13:45,509 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 03:13:48,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4640600.0, ans=0.0 2024-08-20 03:14:06,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4640700.0, ans=0.0 2024-08-20 03:14:30,533 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4700, loss[loss=0.08504, beats_loss=0.01137, ecapa_loss=0.0001139, whisper_loss=0.07253, over 13768.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01061, ecapa_loss=0.0001393, whisper_loss=0.0888, over 3803252.02 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:15:02,838 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 03:15:07,899 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 03:15:18,705 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2024-08-20 03:15:26,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4641000.0, ans=0.0 2024-08-20 03:15:56,792 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 03:15:58,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4641100.0, ans=0.125 2024-08-20 03:16:34,888 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4750, loss[loss=0.1292, beats_loss=0.007024, ecapa_loss=0.0001602, whisper_loss=0.1206, over 22143.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001404, whisper_loss=0.08972, over 3796815.16 frames. ], batch size: 86, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:16:53,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.351e+01 2.626e+01 2.955e+01 4.641e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-20 03:16:54,609 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 03:16:56,016 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 26 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 03:17:13,743 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 03:17:13,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4641400.0, ans=0.125 2024-08-20 03:17:18,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4641400.0, ans=0.125 2024-08-20 03:17:20,975 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 03:17:39,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-08-20 03:17:58,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4641600.0, ans=0.1 2024-08-20 03:18:32,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4641700.0, ans=0.035 2024-08-20 03:18:40,914 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4800, loss[loss=0.08939, beats_loss=0.01254, ecapa_loss=0.00012, whisper_loss=0.07564, over 13828.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001413, whisper_loss=0.08971, over 3811439.09 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:18:48,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:18:55,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:19:03,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:19:29,846 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 03:19:52,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4642000.0, ans=0.1 2024-08-20 03:19:57,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4642100.0, ans=0.125 2024-08-20 03:20:11,178 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 03:20:11,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4642100.0, ans=0.04949747468305833 2024-08-20 03:20:13,572 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 03:20:16,292 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 03:20:40,435 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 03:20:46,510 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4850, loss[loss=0.08772, beats_loss=0.009641, ecapa_loss=0.0001259, whisper_loss=0.07682, over 21085.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.08948, over 3811875.33 frames. ], batch size: 82, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:20:57,310 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 03:21:02,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.316e+01 2.589e+01 3.055e+01 7.163e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-20 03:21:04,129 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 03:21:22,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4642400.0, ans=0.125 2024-08-20 03:21:25,079 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 03:21:34,970 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-20 03:21:46,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4642500.0, ans=0.2 2024-08-20 03:21:53,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4642500.0, ans=0.125 2024-08-20 03:21:55,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4642600.0, ans=0.2 2024-08-20 03:21:58,794 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-20 03:22:08,019 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 18 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-20 03:22:23,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4642700.0, ans=0.5 2024-08-20 03:22:35,261 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4900, loss[loss=0.1027, beats_loss=0.009066, ecapa_loss=0.0001655, whisper_loss=0.09197, over 13847.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001426, whisper_loss=0.08968, over 3816627.96 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:22:47,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4642800.0, ans=0.125 2024-08-20 03:22:55,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4642900.0, ans=0.125 2024-08-20 03:22:55,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=22.5 2024-08-20 03:23:03,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4642900.0, ans=0.125 2024-08-20 03:23:13,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4643000.0, ans=0.1 2024-08-20 03:23:15,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4643000.0, ans=0.1 2024-08-20 03:23:22,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4643000.0, ans=0.125 2024-08-20 03:23:40,628 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 03:24:01,445 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 10 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 03:24:20,431 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 4950, loss[loss=0.09419, beats_loss=0.008715, ecapa_loss=0.0001943, whisper_loss=0.08353, over 19145.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.08977, over 3823485.39 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 1.152921504606847e+18 2024-08-20 03:24:34,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.306e+01 2.561e+01 2.855e+01 3.879e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-20 03:24:37,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4643300.0, ans=0.125 2024-08-20 03:25:19,471 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 39 from LS+wenet, 9 from Vox, 43 fro AS 2024-08-20 03:25:22,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4643600.0, ans=0.0 2024-08-20 03:25:29,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.65 vs. limit=10.0 2024-08-20 03:25:54,389 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 39 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 03:25:55,356 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5000, loss[loss=0.1251, beats_loss=0.009521, ecapa_loss=0.0001344, whisper_loss=0.1142, over 24192.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001412, whisper_loss=0.09011, over 3816584.91 frames. ], batch size: 96, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:26:11,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4643800.0, ans=0.125 2024-08-20 03:26:17,099 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-20 03:26:37,240 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-20 03:26:55,516 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 03:26:59,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4644100.0, ans=0.035 2024-08-20 03:27:04,096 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 03:27:27,632 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5050, loss[loss=0.1238, beats_loss=0.009516, ecapa_loss=0.0001447, whisper_loss=0.1128, over 19793.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001413, whisper_loss=0.09016, over 3816023.28 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:27:40,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-20 03:27:44,294 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.280e+01 2.515e+01 2.844e+01 3.725e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-20 03:27:47,024 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-20 03:27:52,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4644400.0, ans=0.125 2024-08-20 03:27:54,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4644400.0, ans=0.1 2024-08-20 03:28:06,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4644500.0, ans=0.025 2024-08-20 03:28:09,762 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-20 03:28:15,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4644500.0, ans=0.2 2024-08-20 03:28:33,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4644600.0, ans=0.125 2024-08-20 03:28:48,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=10.0 2024-08-20 03:28:56,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4644800.0, ans=0.1 2024-08-20 03:28:57,058 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5100, loss[loss=0.1015, beats_loss=0.01009, ecapa_loss=0.0001413, whisper_loss=0.09002, over 21278.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.000141, whisper_loss=0.08983, over 3800799.05 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:29:07,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-20 03:29:14,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4644900.0, ans=0.125 2024-08-20 03:29:14,523 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-20 03:29:23,426 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 03:29:30,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4644900.0, ans=0.125 2024-08-20 03:29:38,887 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 03:29:43,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4645000.0, ans=0.125 2024-08-20 03:29:45,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-20 03:29:47,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4645000.0, ans=0.125 2024-08-20 03:29:48,048 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 03:30:12,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=22.5 2024-08-20 03:30:21,899 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 03:30:27,059 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5150, loss[loss=0.09111, beats_loss=0.01028, ecapa_loss=0.0001598, whisper_loss=0.07923, over 16948.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001414, whisper_loss=0.09023, over 3785721.41 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:30:27,261 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 03:30:41,295 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 40 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 03:30:42,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.233e+01 2.389e+01 2.694e+01 3.675e+01, threshold=4.778e+01, percent-clipped=0.0 2024-08-20 03:30:55,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4645400.0, ans=0.125 2024-08-20 03:31:22,806 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 03:31:29,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4645600.0, ans=0.125 2024-08-20 03:31:33,719 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 23 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-20 03:31:46,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-20 03:31:54,532 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5200, loss[loss=0.09935, beats_loss=0.01115, ecapa_loss=0.0001493, whisper_loss=0.0867, over 19583.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001401, whisper_loss=0.08988, over 3802815.58 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:32:03,587 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 31 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 03:32:14,032 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 03:32:28,097 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 03:32:32,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4646000.0, ans=0.1 2024-08-20 03:32:51,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4646100.0, ans=0.125 2024-08-20 03:32:52,185 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 03:33:04,687 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 03:33:13,141 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 03:33:13,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4646200.0, ans=0.125 2024-08-20 03:33:14,712 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 03:33:15,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4646200.0, ans=0.1 2024-08-20 03:33:21,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4646200.0, ans=0.2 2024-08-20 03:33:24,349 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5250, loss[loss=0.1125, beats_loss=0.008976, ecapa_loss=0.0001449, whisper_loss=0.1021, over 21314.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001423, whisper_loss=0.09119, over 3796376.47 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:33:30,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4646300.0, ans=0.125 2024-08-20 03:33:37,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-20 03:33:40,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.321e+01 2.600e+01 2.824e+01 7.148e+01, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 03:33:40,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4646400.0, ans=0.125 2024-08-20 03:33:42,316 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 03:33:46,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4646400.0, ans=0.1 2024-08-20 03:33:49,194 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-20 03:34:06,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4646500.0, ans=0.0 2024-08-20 03:34:09,736 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 03:34:20,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4646600.0, ans=0.125 2024-08-20 03:34:47,900 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:34:53,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4646700.0, ans=0.0 2024-08-20 03:34:55,852 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5300, loss[loss=0.07561, beats_loss=0.01211, ecapa_loss=0.0001367, whisper_loss=0.06214, over 19398.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001417, whisper_loss=0.09103, over 3826554.92 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:35:00,597 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.932e-02 2024-08-20 03:35:13,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4646900.0, ans=0.125 2024-08-20 03:35:39,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4647000.0, ans=0.125 2024-08-20 03:35:45,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4647000.0, ans=0.125 2024-08-20 03:35:59,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4647100.0, ans=0.125 2024-08-20 03:36:07,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4647100.0, ans=0.125 2024-08-20 03:36:14,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4647200.0, ans=0.125 2024-08-20 03:36:18,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-20 03:36:36,418 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5350, loss[loss=0.09853, beats_loss=0.0109, ecapa_loss=0.0001394, whisper_loss=0.08624, over 22667.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.000141, whisper_loss=0.09044, over 3781709.91 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:36:52,291 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 03:36:57,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.184e+01 2.426e+01 2.687e+01 4.168e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 03:37:32,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4647500.0, ans=0.0 2024-08-20 03:37:34,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4647500.0, ans=0.125 2024-08-20 03:37:36,424 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 03:37:39,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-20 03:38:34,830 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 03:38:35,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5400, loss[loss=0.08233, beats_loss=0.01299, ecapa_loss=0.0001062, whisper_loss=0.06828, over 15545.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001421, whisper_loss=0.09071, over 3774018.24 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:38:43,881 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 03:39:20,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4648000.0, ans=0.04949747468305833 2024-08-20 03:39:44,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4648100.0, ans=0.0 2024-08-20 03:40:06,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4648200.0, ans=0.0 2024-08-20 03:40:07,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-08-20 03:40:28,661 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5450, loss[loss=0.1296, beats_loss=0.008562, ecapa_loss=0.0001347, whisper_loss=0.1197, over 16998.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.0001417, whisper_loss=0.09081, over 3772307.78 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:40:29,096 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 03:40:43,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-20 03:40:45,557 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.272e+01 2.507e+01 2.790e+01 3.633e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-20 03:40:52,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4648400.0, ans=0.1 2024-08-20 03:40:55,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4648400.0, ans=0.0 2024-08-20 03:41:03,376 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 03:41:16,597 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 03:41:41,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2024-08-20 03:41:52,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4648700.0, ans=0.1 2024-08-20 03:42:00,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4648700.0, ans=0.125 2024-08-20 03:42:12,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4648700.0, ans=0.0 2024-08-20 03:42:12,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4648700.0, ans=0.0 2024-08-20 03:42:18,161 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5500, loss[loss=0.1174, beats_loss=0.008373, ecapa_loss=0.0001516, whisper_loss=0.1075, over 22813.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001415, whisper_loss=0.09098, over 3792360.10 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:42:19,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=4648800.0, ans=0.1 2024-08-20 03:43:31,184 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 03:43:47,675 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 03:44:03,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4649200.0, ans=0.2 2024-08-20 03:44:11,987 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5550, loss[loss=0.1002, beats_loss=0.009366, ecapa_loss=0.0001584, whisper_loss=0.08924, over 19384.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001407, whisper_loss=0.09019, over 3759427.60 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:44:13,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4649300.0, ans=0.125 2024-08-20 03:44:16,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-08-20 03:44:35,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.290e+01 2.579e+01 2.821e+01 2.823e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-20 03:44:42,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4649400.0, ans=0.125 2024-08-20 03:44:46,862 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 03:44:51,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4649400.0, ans=0.0 2024-08-20 03:45:06,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4649500.0, ans=0.0 2024-08-20 03:45:37,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4649600.0, ans=0.125 2024-08-20 03:45:39,508 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 03:45:51,255 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 03:45:55,727 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 03:46:11,398 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5600, loss[loss=0.107, beats_loss=0.008404, ecapa_loss=0.0001685, whisper_loss=0.09692, over 20102.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001421, whisper_loss=0.09, over 3776165.22 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:46:15,004 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 26 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-20 03:46:15,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2024-08-20 03:46:49,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4649900.0, ans=0.0 2024-08-20 03:47:14,575 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-20 03:47:39,457 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 03:47:59,304 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5650, loss[loss=0.1067, beats_loss=0.0103, ecapa_loss=0.000144, whisper_loss=0.09495, over 20564.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08992, over 3794074.49 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:48:08,591 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 03:48:20,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.429e+01 2.607e+01 2.937e+01 4.534e+02, threshold=5.214e+01, percent-clipped=3.0 2024-08-20 03:48:58,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4650500.0, ans=0.1 2024-08-20 03:49:01,733 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 03:49:02,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4650500.0, ans=10.0 2024-08-20 03:49:07,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2024-08-20 03:49:11,266 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2024-08-20 03:49:26,260 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 17 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 03:49:28,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4650600.0, ans=0.0 2024-08-20 03:49:33,919 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-20 03:49:37,007 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.42 vs. limit=22.5 2024-08-20 03:49:54,592 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5700, loss[loss=0.1047, beats_loss=0.005638, ecapa_loss=0.0001797, whisper_loss=0.09729, over 17643.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.09039, over 3815660.87 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:50:27,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4650900.0, ans=0.0 2024-08-20 03:50:34,468 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=22.5 2024-08-20 03:50:44,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4651000.0, ans=0.1 2024-08-20 03:51:20,957 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 03:51:32,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4651200.0, ans=0.07 2024-08-20 03:51:38,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4651200.0, ans=0.0 2024-08-20 03:51:41,479 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5750, loss[loss=0.08962, beats_loss=0.01066, ecapa_loss=0.0001205, whisper_loss=0.07776, over 16007.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001415, whisper_loss=0.09042, over 3819877.48 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:51:46,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4651300.0, ans=0.125 2024-08-20 03:51:48,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4651300.0, ans=0.0 2024-08-20 03:51:57,830 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 03:52:01,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.309e+01 2.653e+01 2.956e+01 1.340e+02, threshold=5.306e+01, percent-clipped=1.0 2024-08-20 03:52:14,402 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 03:52:19,167 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 03:52:54,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-08-20 03:52:56,309 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 03:53:30,703 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5800, loss[loss=0.1054, beats_loss=0.01242, ecapa_loss=0.0001674, whisper_loss=0.09132, over 17520.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.09038, over 3872428.77 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:53:56,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4651900.0, ans=0.0 2024-08-20 03:54:09,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4651900.0, ans=0.1 2024-08-20 03:54:11,372 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 03:54:37,587 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 03:54:37,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4652100.0, ans=0.125 2024-08-20 03:54:39,454 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 03:54:48,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4652100.0, ans=0.125 2024-08-20 03:55:15,722 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5850, loss[loss=0.09544, beats_loss=0.009342, ecapa_loss=0.0001772, whisper_loss=0.08433, over 12591.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.08997, over 3825278.12 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:55:34,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.203e+01 2.512e+01 2.750e+01 3.616e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-20 03:55:35,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4652400.0, ans=0.0 2024-08-20 03:55:41,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4652400.0, ans=0.125 2024-08-20 03:55:44,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4652400.0, ans=0.1 2024-08-20 03:55:57,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=4652500.0, ans=0.02 2024-08-20 03:56:13,204 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 03:56:15,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4652500.0, ans=0.1 2024-08-20 03:56:24,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4652600.0, ans=0.0 2024-08-20 03:56:28,680 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 03:56:51,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-20 03:57:04,979 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 03:57:05,955 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5900, loss[loss=0.08985, beats_loss=0.009086, ecapa_loss=0.0001019, whisper_loss=0.07975, over 15337.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001409, whisper_loss=0.08969, over 3793721.70 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:57:15,882 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 03:57:37,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4652900.0, ans=0.125 2024-08-20 03:57:47,432 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 03:58:17,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4653100.0, ans=0.2 2024-08-20 03:58:38,097 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 03:58:59,959 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 5950, loss[loss=0.09728, beats_loss=0.01151, ecapa_loss=0.0001262, whisper_loss=0.0845, over 22131.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.09037, over 3829967.89 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:59:03,943 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 03:59:21,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.326e+01 2.621e+01 2.901e+01 3.816e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 03:59:47,236 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 04:00:05,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4653600.0, ans=0.0 2024-08-20 04:00:05,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4653600.0, ans=0.125 2024-08-20 04:00:30,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-20 04:00:42,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4653700.0, ans=0.125 2024-08-20 04:00:49,267 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6000, loss[loss=0.08861, beats_loss=0.01272, ecapa_loss=0.000127, whisper_loss=0.07462, over 18379.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001405, whisper_loss=0.09021, over 3804292.22 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:00:49,267 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 04:01:25,867 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005122, whisper_loss=0.2485, over 931116.00 frames. 2024-08-20 04:01:50,551 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-20 04:02:12,319 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9170, 1.8586, 1.9567, 1.7508], device='cuda:2') 2024-08-20 04:02:40,085 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4495, 2.1065, 2.4260, 2.0802], device='cuda:2') 2024-08-20 04:03:00,507 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.8787, 2.2036, 2.3679, 2.1119], device='cuda:2') 2024-08-20 04:03:07,343 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6235, 2.1800, 2.0530, 1.9782], device='cuda:2') 2024-08-20 04:03:25,353 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 04:03:25,357 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 04:03:41,428 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 04:04:23,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4654100.0, ans=0.07 2024-08-20 04:04:43,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4654200.0, ans=0.0 2024-08-20 04:04:48,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4654200.0, ans=0.1 2024-08-20 04:04:54,625 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6050, loss[loss=0.09277, beats_loss=0.0132, ecapa_loss=0.0001303, whisper_loss=0.07827, over 20730.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.09079, over 3820765.88 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:04:55,302 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 04:05:05,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4654300.0, ans=10.0 2024-08-20 04:05:05,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-20 04:05:06,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4654300.0, ans=0.0 2024-08-20 04:05:09,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.277e+01 2.536e+01 2.822e+01 4.959e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 04:05:10,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4654400.0, ans=0.125 2024-08-20 04:05:13,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4654400.0, ans=0.125 2024-08-20 04:05:37,956 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-20 04:05:59,835 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 04:06:24,005 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6100, loss[loss=0.1117, beats_loss=0.009347, ecapa_loss=0.0001605, whisper_loss=0.1007, over 13967.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001403, whisper_loss=0.08971, over 3808029.66 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:06:26,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-20 04:06:41,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4654800.0, ans=0.2 2024-08-20 04:06:43,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4654900.0, ans=0.0 2024-08-20 04:06:46,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4654900.0, ans=0.0 2024-08-20 04:07:11,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4655000.0, ans=0.0 2024-08-20 04:07:20,143 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 04:07:57,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-08-20 04:08:11,819 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6150, loss[loss=0.08931, beats_loss=0.01125, ecapa_loss=0.0001171, whisper_loss=0.07689, over 15266.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001393, whisper_loss=0.09003, over 3834013.12 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:08:15,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-20 04:08:31,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.289e+01 2.520e+01 2.857e+01 4.942e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 04:08:39,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.12 vs. limit=22.5 2024-08-20 04:08:56,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4655500.0, ans=0.125 2024-08-20 04:08:56,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4655500.0, ans=0.0 2024-08-20 04:08:58,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4655500.0, ans=0.125 2024-08-20 04:09:45,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4655700.0, ans=0.125 2024-08-20 04:09:49,716 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 04:09:59,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-20 04:10:01,753 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6200, loss[loss=0.09426, beats_loss=0.01324, ecapa_loss=0.000148, whisper_loss=0.07954, over 17962.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001399, whisper_loss=0.09053, over 3856679.25 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:10:37,657 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 04:11:05,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4656000.0, ans=0.125 2024-08-20 04:11:28,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4656200.0, ans=0.125 2024-08-20 04:11:34,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4656200.0, ans=0.125 2024-08-20 04:11:50,492 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6250, loss[loss=0.09417, beats_loss=0.0107, ecapa_loss=0.0001592, whisper_loss=0.08188, over 21363.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001401, whisper_loss=0.08979, over 3834567.33 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:12:09,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.244e+01 2.486e+01 2.895e+01 5.036e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-20 04:12:13,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4656400.0, ans=0.0 2024-08-20 04:12:22,917 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 14 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 04:12:24,611 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 04:13:40,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4656800.0, ans=0.1 2024-08-20 04:13:40,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4656800.0, ans=0.125 2024-08-20 04:13:41,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6300, loss[loss=0.1133, beats_loss=0.01075, ecapa_loss=0.0001268, whisper_loss=0.1013, over 18642.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001395, whisper_loss=0.09022, over 3828464.60 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:13:59,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4656800.0, ans=0.2 2024-08-20 04:14:06,110 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2024-08-20 04:14:09,855 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 04:14:15,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4656900.0, ans=0.0 2024-08-20 04:14:38,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4657000.0, ans=0.1 2024-08-20 04:14:43,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4657000.0, ans=0.125 2024-08-20 04:15:36,410 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6350, loss[loss=0.09123, beats_loss=0.009711, ecapa_loss=0.0001757, whisper_loss=0.07976, over 20482.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001389, whisper_loss=0.08946, over 3813829.97 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:15:42,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=12.0 2024-08-20 04:15:56,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.231e+01 2.542e+01 2.829e+01 6.825e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-20 04:16:14,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4657400.0, ans=0.05 2024-08-20 04:16:14,397 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2024-08-20 04:16:29,012 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 21 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-20 04:16:30,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4657500.0, ans=0.2 2024-08-20 04:16:53,970 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 04:17:02,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4657700.0, ans=0.125 2024-08-20 04:17:09,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4657700.0, ans=0.04949747468305833 2024-08-20 04:17:14,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4657700.0, ans=0.1 2024-08-20 04:17:21,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-20 04:17:24,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2024-08-20 04:17:25,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4657800.0, ans=0.1 2024-08-20 04:17:26,691 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6400, loss[loss=0.1042, beats_loss=0.01132, ecapa_loss=0.0001181, whisper_loss=0.09172, over 18896.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001394, whisper_loss=0.08922, over 3819062.50 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:17:27,892 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 04:17:28,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4657800.0, ans=0.125 2024-08-20 04:18:08,352 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 04:18:08,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=12.0 2024-08-20 04:18:28,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4658000.0, ans=0.0 2024-08-20 04:18:37,848 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:18:47,019 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-20 04:18:49,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4658100.0, ans=0.0 2024-08-20 04:18:51,297 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 04:18:54,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4658100.0, ans=0.2 2024-08-20 04:19:05,661 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 04:19:06,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2024-08-20 04:19:07,782 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 04:19:10,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4658200.0, ans=0.0 2024-08-20 04:19:12,831 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 04:19:18,228 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6450, loss[loss=0.08805, beats_loss=0.012, ecapa_loss=0.000133, whisper_loss=0.07472, over 21522.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08919, over 3812463.72 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:19:19,224 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-20 04:19:32,796 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 04:19:37,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4658300.0, ans=0.0 2024-08-20 04:19:38,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.209e+01 2.444e+01 2.735e+01 9.511e+01, threshold=4.888e+01, percent-clipped=1.0 2024-08-20 04:19:49,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4658400.0, ans=0.125 2024-08-20 04:20:03,876 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 04:20:27,436 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 04:21:10,952 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-20 04:21:11,449 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6500, loss[loss=0.09373, beats_loss=0.01228, ecapa_loss=9.845e-05, whisper_loss=0.08047, over 14069.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.08884, over 3815220.15 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:21:27,187 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 04:21:27,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4658800.0, ans=0.0 2024-08-20 04:21:31,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4658900.0, ans=0.125 2024-08-20 04:21:37,718 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 04:22:07,212 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-08-20 04:22:40,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4659200.0, ans=0.0 2024-08-20 04:22:45,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4659200.0, ans=0.2 2024-08-20 04:22:55,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4659200.0, ans=0.05 2024-08-20 04:23:02,357 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6550, loss[loss=0.1138, beats_loss=0.01084, ecapa_loss=0.0001179, whisper_loss=0.1018, over 20226.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.08927, over 3824054.97 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:23:23,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.308e+01 2.565e+01 2.877e+01 4.180e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 04:23:30,619 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 04:23:54,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4659500.0, ans=0.1 2024-08-20 04:24:28,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-20 04:25:01,161 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6600, loss[loss=0.09183, beats_loss=0.01103, ecapa_loss=0.0001262, whisper_loss=0.07954, over 15496.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.08924, over 3853396.05 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:25:10,867 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-20 04:25:12,992 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:25:28,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4659900.0, ans=0.0 2024-08-20 04:25:34,858 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 04:25:53,134 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 04:25:57,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2024-08-20 04:26:48,966 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-20 04:26:52,839 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6650, loss[loss=0.1163, beats_loss=0.009075, ecapa_loss=0.0001485, whisper_loss=0.1058, over 20292.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.000141, whisper_loss=0.08918, over 3857545.86 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:27:13,210 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-20 04:27:14,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.439e+01 2.716e+01 3.206e+01 5.057e+01, threshold=5.432e+01, percent-clipped=0.0 2024-08-20 04:27:49,314 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 04:27:49,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4660500.0, ans=0.125 2024-08-20 04:28:11,001 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 04:28:30,191 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.646e-01 2024-08-20 04:28:52,037 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6700, loss[loss=0.08323, beats_loss=0.01106, ecapa_loss=0.000151, whisper_loss=0.07065, over 17713.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.09042, over 3883940.45 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:29:06,665 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 04:29:30,837 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 04:30:06,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4661100.0, ans=0.2 2024-08-20 04:30:15,084 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 21 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 04:30:22,276 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 04:30:33,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4661200.0, ans=0.025 2024-08-20 04:30:33,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:37,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:41,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4661200.0, ans=10.0 2024-08-20 04:30:49,568 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6750, loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001302, whisper_loss=0.08998, over 19067.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001416, whisper_loss=0.09075, over 3878438.14 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:31:05,913 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-20 04:31:08,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.262e+01 2.503e+01 2.805e+01 3.998e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 04:31:17,126 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 04:31:29,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4661400.0, ans=0.2 2024-08-20 04:31:29,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4661400.0, ans=0.125 2024-08-20 04:31:40,680 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 04:31:44,955 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-20 04:31:45,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-08-20 04:32:17,655 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 04:32:38,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4661700.0, ans=0.0 2024-08-20 04:32:41,870 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6800, loss[loss=0.1141, beats_loss=0.008892, ecapa_loss=0.0001175, whisper_loss=0.104, over 15394.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.000142, whisper_loss=0.09079, over 3856798.04 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:32:45,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4661800.0, ans=0.125 2024-08-20 04:32:54,651 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 17 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 04:33:04,359 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 04:33:08,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4661900.0, ans=0.0 2024-08-20 04:33:39,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4662000.0, ans=0.125 2024-08-20 04:33:42,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=12.0 2024-08-20 04:34:03,805 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 04:34:16,097 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 32 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 04:34:18,851 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-20 04:34:23,524 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 04:34:35,335 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6850, loss[loss=0.09335, beats_loss=0.01075, ecapa_loss=0.0001031, whisper_loss=0.08157, over 17085.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01033, ecapa_loss=0.0001412, whisper_loss=0.0913, over 3843796.48 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:34:50,381 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 14 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-20 04:34:52,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4662300.0, ans=0.0 2024-08-20 04:34:55,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.273e+01 2.508e+01 2.881e+01 4.383e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 04:35:05,427 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 04:35:21,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4662500.0, ans=10.0 2024-08-20 04:35:32,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4662500.0, ans=0.0 2024-08-20 04:35:46,353 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 04:35:53,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2024-08-20 04:35:58,401 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-20 04:36:05,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:24,272 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 04:36:24,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:26,910 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 32 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 04:36:27,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4662800.0, ans=0.05 2024-08-20 04:36:27,892 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6900, loss[loss=0.1185, beats_loss=0.008216, ecapa_loss=0.0001323, whisper_loss=0.109, over 22567.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001405, whisper_loss=0.09078, over 3824966.87 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:36:55,070 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 04:37:05,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4662900.0, ans=0.0 2024-08-20 04:38:14,713 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 6950, loss[loss=0.1281, beats_loss=0.008689, ecapa_loss=0.0001539, whisper_loss=0.1179, over 23454.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.09018, over 3808249.06 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:38:35,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4663300.0, ans=0.125 2024-08-20 04:38:35,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.402e+01 2.667e+01 2.923e+01 3.663e+02, threshold=5.334e+01, percent-clipped=2.0 2024-08-20 04:38:51,895 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 04:38:59,030 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-08-20 04:39:07,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4663500.0, ans=0.125 2024-08-20 04:39:11,249 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 04:39:28,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4663600.0, ans=0.125 2024-08-20 04:39:43,616 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 04:39:47,670 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 04:39:57,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4663800.0, ans=0.125 2024-08-20 04:39:58,160 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7000, loss[loss=0.1005, beats_loss=0.01094, ecapa_loss=0.0001418, whisper_loss=0.0881, over 22012.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.0897, over 3762489.83 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:40:00,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4663800.0, ans=0.125 2024-08-20 04:40:04,768 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:40:04,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4663800.0, ans=0.125 2024-08-20 04:40:12,726 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 04:40:12,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4663800.0, ans=0.125 2024-08-20 04:40:59,343 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 04:41:07,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4664100.0, ans=0.125 2024-08-20 04:41:09,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4664100.0, ans=0.0 2024-08-20 04:41:10,476 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 04:41:19,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4664200.0, ans=0.125 2024-08-20 04:41:31,588 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7050, loss[loss=0.1005, beats_loss=0.009875, ecapa_loss=0.0001126, whisper_loss=0.08946, over 15173.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.08984, over 3787762.33 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:41:36,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2024-08-20 04:41:47,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.321e+01 2.580e+01 2.916e+01 2.806e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-20 04:42:09,553 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 04:42:23,671 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 04:42:23,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4664500.0, ans=0.2 2024-08-20 04:42:26,070 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4664600.0, ans=0.0 2024-08-20 04:42:47,928 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-20 04:42:51,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4664700.0, ans=0.09899494936611666 2024-08-20 04:43:05,567 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7100, loss[loss=0.1155, beats_loss=0.01249, ecapa_loss=0.0001179, whisper_loss=0.1019, over 15445.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08939, over 3743475.57 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:43:19,288 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 04:43:29,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4664900.0, ans=0.125 2024-08-20 04:43:33,725 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-20 04:43:47,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-20 04:43:53,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4665000.0, ans=0.125 2024-08-20 04:43:58,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4665000.0, ans=0.0 2024-08-20 04:44:29,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4665100.0, ans=0.1 2024-08-20 04:44:39,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4665200.0, ans=0.125 2024-08-20 04:44:42,131 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 7 from Vox, 31 fro AS 2024-08-20 04:44:49,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4665200.0, ans=0.1 2024-08-20 04:44:57,408 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7150, loss[loss=0.09116, beats_loss=0.009247, ecapa_loss=0.0001647, whisper_loss=0.08027, over 17881.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001391, whisper_loss=0.08995, over 3734013.82 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:45:11,898 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 04:45:17,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.230e+01 2.408e+01 2.713e+01 4.387e+01, threshold=4.817e+01, percent-clipped=0.0 2024-08-20 04:45:54,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4665500.0, ans=0.125 2024-08-20 04:46:48,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4665700.0, ans=0.0 2024-08-20 04:46:52,136 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7200, loss[loss=0.09999, beats_loss=0.008839, ecapa_loss=0.0001233, whisper_loss=0.08991, over 16221.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.09084, over 3754815.30 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:47:08,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4665800.0, ans=0.1 2024-08-20 04:47:13,908 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 04:47:39,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4666000.0, ans=0.0 2024-08-20 04:47:43,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4666000.0, ans=0.125 2024-08-20 04:48:19,207 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 04:48:27,427 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 16 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 04:48:32,038 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 04:48:39,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4666200.0, ans=0.125 2024-08-20 04:48:44,274 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7250, loss[loss=0.1029, beats_loss=0.008622, ecapa_loss=0.0001371, whisper_loss=0.09295, over 18055.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001407, whisper_loss=0.09095, over 3784436.15 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:48:47,400 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 04:49:04,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.278e+01 2.449e+01 2.713e+01 3.965e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:49:29,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4666500.0, ans=0.125 2024-08-20 04:49:31,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4666500.0, ans=0.09899494936611666 2024-08-20 04:49:33,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4666500.0, ans=0.0 2024-08-20 04:49:44,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4666500.0, ans=0.0 2024-08-20 04:49:50,691 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 04:50:05,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4666600.0, ans=0.2 2024-08-20 04:50:33,859 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7300, loss[loss=0.1044, beats_loss=0.009353, ecapa_loss=0.0001646, whisper_loss=0.09335, over 21759.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001408, whisper_loss=0.09004, over 3737637.27 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:51:52,327 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 04:52:05,969 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 04:52:14,457 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 04:52:28,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-20 04:52:29,463 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7350, loss[loss=0.09159, beats_loss=0.01031, ecapa_loss=0.000151, whisper_loss=0.07978, over 15623.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08942, over 3746984.97 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:52:50,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.449e+01 2.717e+01 4.858e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:53:06,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4667400.0, ans=0.1 2024-08-20 04:53:26,703 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 04:53:44,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4667600.0, ans=0.125 2024-08-20 04:53:56,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4667600.0, ans=0.2 2024-08-20 04:54:04,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4667700.0, ans=0.0 2024-08-20 04:54:09,069 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=12.0 2024-08-20 04:54:11,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4667700.0, ans=0.125 2024-08-20 04:54:13,070 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 04:54:20,630 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7400, loss[loss=0.1135, beats_loss=0.00992, ecapa_loss=0.0001429, whisper_loss=0.1021, over 20116.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.08992, over 3754448.66 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:54:34,636 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 04:54:44,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4667900.0, ans=0.04949747468305833 2024-08-20 04:55:06,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4668000.0, ans=0.0 2024-08-20 04:55:18,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4668000.0, ans=0.0 2024-08-20 04:55:39,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-20 04:55:48,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4668100.0, ans=0.2 2024-08-20 04:56:18,306 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7450, loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001347, whisper_loss=0.09067, over 21709.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001422, whisper_loss=0.09033, over 3766335.03 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:56:31,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4668300.0, ans=0.1 2024-08-20 04:56:38,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4668300.0, ans=0.0 2024-08-20 04:56:39,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.202e+01 2.465e+01 2.731e+01 3.799e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-20 04:56:48,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4668400.0, ans=0.2 2024-08-20 04:57:28,440 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 04:57:28,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4668600.0, ans=0.125 2024-08-20 04:57:47,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-08-20 04:57:53,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-20 04:57:58,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-20 04:57:59,425 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 04:58:12,397 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7500, loss[loss=0.1006, beats_loss=0.01157, ecapa_loss=0.0001469, whisper_loss=0.08753, over 13703.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.09033, over 3756539.28 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:58:18,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4668800.0, ans=0.2 2024-08-20 04:58:20,360 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 04:58:32,522 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2024-08-20 04:58:35,311 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 04:58:42,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4668900.0, ans=0.0 2024-08-20 04:58:47,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2024-08-20 04:59:08,446 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4669000.0, ans=0.125 2024-08-20 04:59:46,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4669200.0, ans=0.07 2024-08-20 04:59:55,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4669200.0, ans=10.0 2024-08-20 04:59:55,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4669200.0, ans=0.0 2024-08-20 05:00:02,324 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 05:00:02,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4669200.0, ans=0.125 2024-08-20 05:00:05,032 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7550, loss[loss=0.09795, beats_loss=0.01018, ecapa_loss=0.0001853, whisper_loss=0.08591, over 20847.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.0001419, whisper_loss=0.09016, over 3751401.94 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:00:22,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.290e+01 2.609e+01 3.060e+01 2.674e+02, threshold=5.218e+01, percent-clipped=1.0 2024-08-20 05:00:30,661 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 19 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-20 05:00:30,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4669400.0, ans=0.0 2024-08-20 05:00:47,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4669500.0, ans=0.125 2024-08-20 05:01:16,589 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 05:01:29,101 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-20 05:01:52,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 05:01:54,753 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 05:01:57,852 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7600, loss[loss=0.09409, beats_loss=0.01018, ecapa_loss=0.000154, whisper_loss=0.08236, over 19989.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0103, ecapa_loss=0.0001416, whisper_loss=0.08972, over 3759482.84 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:01:59,622 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 05:02:15,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4669800.0, ans=0.0 2024-08-20 05:02:17,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4669800.0, ans=0.0 2024-08-20 05:02:17,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4669800.0, ans=0.07 2024-08-20 05:02:39,408 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 05:03:13,403 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.00 vs. limit=6.0 2024-08-20 05:03:23,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4670100.0, ans=0.0 2024-08-20 05:03:45,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4670200.0, ans=0.125 2024-08-20 05:03:47,146 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 38 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 05:03:47,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4670300.0, ans=0.1 2024-08-20 05:03:48,111 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7650, loss[loss=0.128, beats_loss=0.006543, ecapa_loss=0.0001678, whisper_loss=0.1198, over 21643.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0102, ecapa_loss=0.0001422, whisper_loss=0.0908, over 3794943.21 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:04:01,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4670300.0, ans=0.125 2024-08-20 05:04:08,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.328e+01 2.537e+01 2.838e+01 5.582e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-20 05:04:25,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=8.0 2024-08-20 05:04:36,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4670500.0, ans=0.1 2024-08-20 05:04:58,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4670600.0, ans=0.1 2024-08-20 05:05:11,858 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.801e-02 2024-08-20 05:05:32,578 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7700, loss[loss=0.0837, beats_loss=0.01059, ecapa_loss=0.0001451, whisper_loss=0.07166, over 20246.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01023, ecapa_loss=0.0001421, whisper_loss=0.09027, over 3767839.55 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:05:42,170 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 05:05:42,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4670800.0, ans=0.125 2024-08-20 05:05:55,186 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-20 05:06:15,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4671000.0, ans=0.0 2024-08-20 05:06:37,743 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 05:06:39,850 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 05:06:49,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4671100.0, ans=0.1 2024-08-20 05:07:28,235 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7750, loss[loss=0.07956, beats_loss=0.01228, ecapa_loss=0.0001481, whisper_loss=0.06579, over 14514.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01028, ecapa_loss=0.0001417, whisper_loss=0.08947, over 3781623.93 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:07:38,747 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 05:07:46,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4671300.0, ans=0.125 2024-08-20 05:07:49,782 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.264e+01 2.430e+01 2.732e+01 4.233e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-20 05:08:15,464 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 15 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 05:08:35,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.17 vs. limit=10.0 2024-08-20 05:08:40,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-20 05:08:51,024 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-20 05:09:18,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-20 05:09:31,020 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7800, loss[loss=0.09617, beats_loss=0.01003, ecapa_loss=0.000136, whisper_loss=0.08478, over 21443.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001405, whisper_loss=0.09022, over 3795519.85 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:09:38,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4671800.0, ans=0.1 2024-08-20 05:09:40,940 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 05:09:54,715 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 05:10:06,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-08-20 05:10:13,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4671900.0, ans=0.125 2024-08-20 05:10:15,771 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 05:10:24,668 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-20 05:10:39,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-20 05:10:54,443 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 05:11:11,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4672200.0, ans=0.125 2024-08-20 05:11:11,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.35 vs. limit=10.0 2024-08-20 05:11:26,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7850, loss[loss=0.1043, beats_loss=0.009818, ecapa_loss=0.0001389, whisper_loss=0.09305, over 20984.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01031, ecapa_loss=0.0001406, whisper_loss=0.08998, over 3808691.56 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:11:31,052 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 05:11:38,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4672300.0, ans=0.0 2024-08-20 05:11:46,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.521e+01 2.830e+01 3.600e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-20 05:12:02,708 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-08-20 05:12:09,983 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:12:48,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.36 vs. limit=22.5 2024-08-20 05:13:06,292 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 05:13:14,789 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7900, loss[loss=0.1031, beats_loss=0.01163, ecapa_loss=0.0001214, whisper_loss=0.0902, over 22346.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.000141, whisper_loss=0.09035, over 3847617.31 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:13:22,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4672800.0, ans=0.125 2024-08-20 05:13:36,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2024-08-20 05:13:52,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4672900.0, ans=0.05 2024-08-20 05:13:59,865 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-20 05:14:03,693 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 05:14:03,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4673000.0, ans=0.125 2024-08-20 05:14:07,415 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 05:14:07,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4673000.0, ans=0.125 2024-08-20 05:14:34,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4673100.0, ans=0.2 2024-08-20 05:15:03,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-08-20 05:15:08,773 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 7950, loss[loss=0.09018, beats_loss=0.01127, ecapa_loss=0.0001093, whisper_loss=0.07782, over 14163.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001405, whisper_loss=0.0901, over 3857884.31 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:15:27,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4673300.0, ans=0.0 2024-08-20 05:15:28,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.672e+01 2.282e+01 2.544e+01 2.823e+01 6.203e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-20 05:15:38,137 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-20 05:15:45,563 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 05:16:04,433 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 05:16:13,741 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 26 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 05:16:42,450 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 26 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 05:16:57,533 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8000, loss[loss=0.09458, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.08262, over 13178.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.08988, over 3808270.43 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:17:17,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4673900.0, ans=0.1 2024-08-20 05:17:40,971 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 05:17:50,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4674000.0, ans=0.0 2024-08-20 05:17:57,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4674000.0, ans=0.0 2024-08-20 05:18:03,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2024-08-20 05:18:06,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4674100.0, ans=0.1 2024-08-20 05:18:19,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4674200.0, ans=0.0 2024-08-20 05:18:33,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4674200.0, ans=0.125 2024-08-20 05:18:41,448 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8050, loss[loss=0.08957, beats_loss=0.01063, ecapa_loss=0.0001153, whisper_loss=0.07779, over 13689.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.08942, over 3776180.25 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:18:57,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-20 05:18:59,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.351e+01 2.548e+01 2.857e+01 8.304e+01, threshold=5.095e+01, percent-clipped=2.0 2024-08-20 05:19:08,162 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 05:19:26,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4674500.0, ans=0.125 2024-08-20 05:19:40,004 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 05:19:47,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-20 05:20:05,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4674600.0, ans=6.0 2024-08-20 05:20:29,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4674700.0, ans=0.2 2024-08-20 05:20:32,126 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8100, loss[loss=0.1108, beats_loss=0.006736, ecapa_loss=0.0001659, whisper_loss=0.1024, over 16223.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.0894, over 3746118.70 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:20:47,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4674800.0, ans=0.0 2024-08-20 05:20:53,620 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 05:21:03,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4674900.0, ans=0.125 2024-08-20 05:21:10,324 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 05:21:27,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4675000.0, ans=0.0 2024-08-20 05:21:41,415 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4675100.0, ans=0.1 2024-08-20 05:22:00,020 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 05:22:04,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4675200.0, ans=0.125 2024-08-20 05:22:04,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4675200.0, ans=0.125 2024-08-20 05:22:07,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4675200.0, ans=0.0 2024-08-20 05:22:25,232 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8150, loss[loss=0.1022, beats_loss=0.009341, ecapa_loss=0.0001721, whisper_loss=0.09117, over 18393.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.0895, over 3748290.26 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:22:29,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-20 05:22:47,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.184e+01 2.427e+01 2.667e+01 4.030e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 05:23:14,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-20 05:23:18,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4675500.0, ans=0.125 2024-08-20 05:23:34,885 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 05:23:35,289 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-20 05:23:50,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4675600.0, ans=0.2 2024-08-20 05:23:53,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4675600.0, ans=0.07 2024-08-20 05:24:15,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4675700.0, ans=0.1 2024-08-20 05:24:21,742 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8200, loss[loss=0.1227, beats_loss=0.005871, ecapa_loss=0.0001773, whisper_loss=0.1151, over 14764.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.09026, over 3758019.49 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:24:38,917 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 05:24:43,922 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 05:24:59,360 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 05:24:59,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4675900.0, ans=0.125 2024-08-20 05:25:01,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4675900.0, ans=0.125 2024-08-20 05:25:34,276 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03858109936118126, model_norm_threshold=48.536659240722656 2024-08-20 05:25:34,437 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.171e+05, grad_sumsq=2.171e+05, orig_rms_sq=1.000e+00 2024-08-20 05:25:50,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-20 05:26:08,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4676200.0, ans=0.125 2024-08-20 05:26:15,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-20 05:26:15,886 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8250, loss[loss=0.1076, beats_loss=0.01079, ecapa_loss=0.0001048, whisper_loss=0.09579, over 18543.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.09051, over 3766211.64 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:26:36,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.406e+01 2.629e+01 3.126e+01 1.258e+03, threshold=5.257e+01, percent-clipped=4.0 2024-08-20 05:26:38,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4676400.0, ans=0.04949747468305833 2024-08-20 05:26:40,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4676400.0, ans=0.1 2024-08-20 05:26:45,725 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 05:26:48,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4676400.0, ans=0.1 2024-08-20 05:27:48,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4676600.0, ans=0.125 2024-08-20 05:27:49,772 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 05:27:52,307 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 8 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 05:27:57,032 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 05:28:07,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4676700.0, ans=0.125 2024-08-20 05:28:10,738 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-08-20 05:28:15,364 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8300, loss[loss=0.09514, beats_loss=0.01306, ecapa_loss=0.0001093, whisper_loss=0.08099, over 22343.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.09109, over 3812242.29 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:28:44,673 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-20 05:28:57,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4676900.0, ans=0.0 2024-08-20 05:28:59,148 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-20 05:29:02,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4677000.0, ans=0.125 2024-08-20 05:29:33,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4677100.0, ans=10.0 2024-08-20 05:29:42,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4677100.0, ans=0.125 2024-08-20 05:29:56,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4677200.0, ans=0.0 2024-08-20 05:30:03,574 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 05:30:08,648 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8350, loss[loss=0.1165, beats_loss=0.0112, ecapa_loss=0.0001568, whisper_loss=0.1037, over 16880.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.09064, over 3810399.90 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:30:15,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4677300.0, ans=0.125 2024-08-20 05:30:26,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.380e+01 2.610e+01 3.013e+01 5.449e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-20 05:30:32,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4677400.0, ans=0.0 2024-08-20 05:30:37,854 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 05:30:47,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-08-20 05:31:04,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4677500.0, ans=0.125 2024-08-20 05:31:06,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4677500.0, ans=0.125 2024-08-20 05:31:11,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-20 05:31:37,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4677700.0, ans=0.5 2024-08-20 05:31:48,898 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8400, loss[loss=0.1062, beats_loss=0.009015, ecapa_loss=0.0001418, whisper_loss=0.09576, over 15726.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.08999, over 3801707.60 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:31:49,534 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 05:31:52,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-20 05:31:53,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4677800.0, ans=0.2 2024-08-20 05:32:00,340 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 05:32:27,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4677900.0, ans=0.125 2024-08-20 05:32:59,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4678100.0, ans=0.125 2024-08-20 05:33:18,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4678200.0, ans=0.0 2024-08-20 05:33:24,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2024-08-20 05:33:44,242 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8450, loss[loss=0.1191, beats_loss=0.009271, ecapa_loss=0.0001623, whisper_loss=0.1082, over 13444.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.09003, over 3803277.97 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:34:03,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.232e+01 2.452e+01 2.661e+01 1.500e+02, threshold=4.905e+01, percent-clipped=2.0 2024-08-20 05:34:11,616 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 05:34:39,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4678500.0, ans=0.2 2024-08-20 05:34:55,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4678600.0, ans=0.125 2024-08-20 05:34:55,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4678600.0, ans=0.125 2024-08-20 05:35:35,744 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8500, loss[loss=0.09722, beats_loss=0.009916, ecapa_loss=0.0001851, whisper_loss=0.08545, over 18918.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.08987, over 3802440.28 frames. ], batch size: 83, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:35:39,616 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-20 05:35:46,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4678800.0, ans=0.125 2024-08-20 05:36:16,465 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 05:36:17,060 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-20 05:36:53,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4679100.0, ans=0.125 2024-08-20 05:37:00,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4679100.0, ans=0.125 2024-08-20 05:37:12,959 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4679200.0, ans=0.0 2024-08-20 05:37:20,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-20 05:37:27,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4679200.0, ans=0.125 2024-08-20 05:37:30,692 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8550, loss[loss=0.08798, beats_loss=0.0122, ecapa_loss=0.0001509, whisper_loss=0.07427, over 21592.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001407, whisper_loss=0.08964, over 3804683.70 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:37:50,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.232e+01 2.501e+01 2.726e+01 3.621e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-20 05:37:52,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4679400.0, ans=0.125 2024-08-20 05:38:07,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4679400.0, ans=0.0 2024-08-20 05:38:21,324 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 36 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-20 05:38:28,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4679500.0, ans=0.2 2024-08-20 05:38:43,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4679600.0, ans=0.0 2024-08-20 05:38:49,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4679600.0, ans=0.2 2024-08-20 05:39:22,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4679700.0, ans=0.0 2024-08-20 05:39:24,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4679700.0, ans=0.1 2024-08-20 05:39:27,861 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8600, loss[loss=0.1306, beats_loss=0.008435, ecapa_loss=0.0001312, whisper_loss=0.1209, over 23060.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09038, over 3837240.74 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:39:53,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4679900.0, ans=0.025 2024-08-20 05:40:31,014 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 05:41:05,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-20 05:41:12,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4680200.0, ans=0.2 2024-08-20 05:41:17,511 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8650, loss[loss=0.08791, beats_loss=0.008925, ecapa_loss=0.0001786, whisper_loss=0.0772, over 13418.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001414, whisper_loss=0.08935, over 3814282.19 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:41:21,003 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 05:41:39,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.245e+01 2.496e+01 2.765e+01 3.926e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-20 05:43:03,927 WARNING [optim.py:496] (2/4) Scaling gradients by 0.040249649435281754, model_norm_threshold=49.920475006103516 2024-08-20 05:43:04,089 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.371e+05, grad_sumsq=4.172e+04, orig_rms_sq=3.286e+00 2024-08-20 05:43:15,060 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8700, loss[loss=0.1067, beats_loss=0.009231, ecapa_loss=0.0001308, whisper_loss=0.09618, over 16539.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001408, whisper_loss=0.08955, over 3824766.21 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:43:22,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4680800.0, ans=0.125 2024-08-20 05:43:28,461 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 05:43:35,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4680900.0, ans=0.95 2024-08-20 05:43:56,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4680900.0, ans=0.125 2024-08-20 05:44:20,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4681000.0, ans=0.125 2024-08-20 05:44:40,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4681100.0, ans=0.125 2024-08-20 05:44:44,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4681100.0, ans=0.125 2024-08-20 05:44:46,485 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 05:45:09,932 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 05:45:10,989 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8750, loss[loss=0.1153, beats_loss=0.008149, ecapa_loss=0.0001376, whisper_loss=0.1058, over 16734.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.00014, whisper_loss=0.08977, over 3823579.95 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:45:32,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.309e+01 2.539e+01 2.876e+01 1.240e+03, threshold=5.077e+01, percent-clipped=3.0 2024-08-20 05:45:33,812 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 17 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 05:46:37,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4681600.0, ans=0.125 2024-08-20 05:46:55,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4681700.0, ans=0.1 2024-08-20 05:47:03,870 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8800, loss[loss=0.1072, beats_loss=0.01121, ecapa_loss=0.0001532, whisper_loss=0.09445, over 22699.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08919, over 3798512.69 frames. ], batch size: 95, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:47:11,104 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 18 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 05:47:25,316 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0940530002117157, model_norm_threshold=50.77210235595703 2024-08-20 05:47:25,479 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.105e+04, grad_sumsq=6.105e+04, orig_rms_sq=1.000e+00 2024-08-20 05:48:09,838 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 05:48:11,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4682100.0, ans=0.0 2024-08-20 05:48:17,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4682100.0, ans=0.0 2024-08-20 05:48:36,396 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-08-20 05:48:40,158 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4682200.0, ans=0.0 2024-08-20 05:48:42,394 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8850, loss[loss=0.1, beats_loss=0.01137, ecapa_loss=0.0001419, whisper_loss=0.08721, over 21868.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08934, over 3797546.42 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:48:52,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2024-08-20 05:48:54,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4682300.0, ans=0.125 2024-08-20 05:48:58,981 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.309e+01 2.540e+01 2.877e+01 5.398e+02, threshold=5.080e+01, percent-clipped=3.0 2024-08-20 05:49:02,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4682400.0, ans=0.2 2024-08-20 05:49:17,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4682400.0, ans=0.125 2024-08-20 05:49:26,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4682500.0, ans=0.0 2024-08-20 05:49:47,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4682600.0, ans=0.2 2024-08-20 05:50:12,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-20 05:50:18,900 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8900, loss[loss=0.07884, beats_loss=0.01145, ecapa_loss=0.0001287, whisper_loss=0.0661, over 14996.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.08927, over 3766227.22 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:50:39,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4682900.0, ans=0.125 2024-08-20 05:50:41,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-08-20 05:50:45,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4682900.0, ans=0.0 2024-08-20 05:51:04,886 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-20 05:51:06,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4683000.0, ans=0.125 2024-08-20 05:51:30,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4683100.0, ans=0.125 2024-08-20 05:51:31,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4683100.0, ans=0.125 2024-08-20 05:51:41,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=12.0 2024-08-20 05:51:43,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4683200.0, ans=0.125 2024-08-20 05:51:45,055 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 05:51:55,966 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 8950, loss[loss=0.1043, beats_loss=0.01169, ecapa_loss=0.0001507, whisper_loss=0.09115, over 16759.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.08981, over 3803539.23 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:52:12,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.265e+01 2.516e+01 2.733e+01 4.609e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 05:52:18,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4683400.0, ans=0.125 2024-08-20 05:52:45,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4683500.0, ans=0.125 2024-08-20 05:52:50,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4683600.0, ans=0.125 2024-08-20 05:53:01,943 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2024-08-20 05:53:02,042 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=12.0 2024-08-20 05:53:12,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4683700.0, ans=0.125 2024-08-20 05:53:17,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-20 05:53:25,536 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9000, loss[loss=0.1073, beats_loss=0.01098, ecapa_loss=0.0001241, whisper_loss=0.09509, over 19585.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.08925, over 3793971.67 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:53:25,537 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 05:54:02,437 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 05:54:24,132 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on SV_voxceleb1: loss=0.004011, beats_loss=0, ecapa_loss=0.0004011, whisper_loss=0, over 944235.00 frames. 2024-08-20 05:56:01,392 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on AT_audioset: loss=0.02303, beats_loss=0.02303, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 05:56:01,396 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 05:56:04,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4683800.0, ans=0.05 2024-08-20 05:56:08,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4683800.0, ans=0.0 2024-08-20 05:56:10,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4683800.0, ans=0.125 2024-08-20 05:56:13,249 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 05:56:13,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4683800.0, ans=0.125 2024-08-20 05:56:16,872 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 05:56:43,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4684000.0, ans=0.125 2024-08-20 05:57:03,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4684100.0, ans=0.125 2024-08-20 05:57:09,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4684200.0, ans=0.125 2024-08-20 05:57:15,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4684200.0, ans=0.0 2024-08-20 05:57:19,124 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 05:57:24,001 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9050, loss[loss=0.1039, beats_loss=0.01061, ecapa_loss=0.0001221, whisper_loss=0.09203, over 15257.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.0889, over 3772256.98 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:57:27,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4684300.0, ans=0.125 2024-08-20 05:57:38,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.206e+01 2.470e+01 2.742e+01 4.296e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 05:57:42,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4684400.0, ans=0.1 2024-08-20 05:57:52,819 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4684400.0, ans=0.125 2024-08-20 05:58:05,332 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2024-08-20 05:58:45,889 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9100, loss[loss=0.09101, beats_loss=0.01016, ecapa_loss=0.0001615, whisper_loss=0.07923, over 21828.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01053, ecapa_loss=0.000139, whisper_loss=0.08844, over 3766654.03 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:58:49,579 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-20 05:59:11,410 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 16 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 05:59:14,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4684900.0, ans=0.1 2024-08-20 05:59:24,914 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 21 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 05:59:26,657 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 05:59:44,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4685100.0, ans=0.0 2024-08-20 06:00:03,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4685200.0, ans=0.0 2024-08-20 06:00:10,210 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9150, loss[loss=0.08858, beats_loss=0.01241, ecapa_loss=0.0001535, whisper_loss=0.07464, over 21057.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01053, ecapa_loss=0.0001385, whisper_loss=0.08891, over 3780684.06 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:00:27,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.310e+01 2.550e+01 2.853e+01 1.227e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-20 06:00:43,937 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:00:55,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4685500.0, ans=0.125 2024-08-20 06:01:10,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4685600.0, ans=0.1 2024-08-20 06:01:17,082 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 06:01:34,608 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 06:01:35,572 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9200, loss[loss=0.09699, beats_loss=0.01007, ecapa_loss=0.0001199, whisper_loss=0.08572, over 17779.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01061, ecapa_loss=0.0001387, whisper_loss=0.0885, over 3792201.43 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:01:40,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4685800.0, ans=0.125 2024-08-20 06:01:48,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4685800.0, ans=0.2 2024-08-20 06:02:01,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4685900.0, ans=0.0 2024-08-20 06:02:16,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4686000.0, ans=0.125 2024-08-20 06:02:27,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4686100.0, ans=0.0 2024-08-20 06:02:31,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4686100.0, ans=0.1 2024-08-20 06:02:47,924 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 06:02:55,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4686200.0, ans=0.95 2024-08-20 06:03:02,850 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9250, loss[loss=0.09583, beats_loss=0.01297, ecapa_loss=0.0001054, whisper_loss=0.08181, over 17191.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0106, ecapa_loss=0.0001386, whisper_loss=0.08892, over 3801280.96 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:03:10,755 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 06:03:20,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.318e+01 2.500e+01 2.733e+01 3.571e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-20 06:03:25,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4686400.0, ans=0.2 2024-08-20 06:03:55,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.62 vs. limit=6.0 2024-08-20 06:03:58,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4686600.0, ans=0.0 2024-08-20 06:04:05,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4686600.0, ans=0.0 2024-08-20 06:04:18,554 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 06:04:22,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4686700.0, ans=0.0 2024-08-20 06:04:25,714 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 19 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 06:04:31,234 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9300, loss[loss=0.09366, beats_loss=0.009724, ecapa_loss=0.0001439, whisper_loss=0.0825, over 12660.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001388, whisper_loss=0.08967, over 3806107.84 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:04:46,217 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-20 06:04:48,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4686900.0, ans=0.0 2024-08-20 06:04:52,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2024-08-20 06:04:58,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4686900.0, ans=0.0 2024-08-20 06:05:26,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4687100.0, ans=0.2 2024-08-20 06:05:26,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4687100.0, ans=0.125 2024-08-20 06:05:56,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4687200.0, ans=0.125 2024-08-20 06:06:05,675 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 06:06:08,692 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9350, loss[loss=0.09678, beats_loss=0.01115, ecapa_loss=0.0001088, whisper_loss=0.08455, over 17559.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.000138, whisper_loss=0.08947, over 3807305.55 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:06:09,800 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 06:06:18,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-20 06:06:21,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4687300.0, ans=0.125 2024-08-20 06:06:27,623 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-20 06:06:28,518 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.303e+01 2.586e+01 2.791e+01 3.756e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-20 06:06:32,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=4687400.0, ans=0.2 2024-08-20 06:06:40,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4687400.0, ans=0.125 2024-08-20 06:07:03,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4687600.0, ans=0.0 2024-08-20 06:07:07,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4687600.0, ans=0.125 2024-08-20 06:07:10,325 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 25 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 06:07:13,318 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2024-08-20 06:07:38,235 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 06:07:39,644 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9400, loss[loss=0.09556, beats_loss=0.01094, ecapa_loss=0.0001627, whisper_loss=0.08299, over 21991.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01057, ecapa_loss=0.0001387, whisper_loss=0.08868, over 3772150.64 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:07:46,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4687800.0, ans=0.0 2024-08-20 06:07:50,357 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-20 06:08:21,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4688000.0, ans=0.125 2024-08-20 06:08:21,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4688000.0, ans=0.0 2024-08-20 06:08:25,750 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:08:27,058 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 06:08:39,713 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 06:08:48,596 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 06:09:08,342 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9450, loss[loss=0.0896, beats_loss=0.01177, ecapa_loss=0.0001617, whisper_loss=0.07621, over 20701.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001386, whisper_loss=0.08939, over 3817918.59 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:09:23,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4688300.0, ans=0.125 2024-08-20 06:09:27,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=15.0 2024-08-20 06:09:27,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.376e+01 2.594e+01 2.934e+01 1.922e+02, threshold=5.189e+01, percent-clipped=1.0 2024-08-20 06:09:28,503 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 06:09:36,843 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 06:10:07,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4688600.0, ans=0.0 2024-08-20 06:10:10,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4688600.0, ans=0.125 2024-08-20 06:10:35,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4688800.0, ans=0.2 2024-08-20 06:10:36,259 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9500, loss[loss=0.1095, beats_loss=0.01084, ecapa_loss=0.0001613, whisper_loss=0.09708, over 22595.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.08951, over 3836268.61 frames. ], batch size: 95, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:10:48,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4688800.0, ans=0.125 2024-08-20 06:10:57,783 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 06:11:02,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2024-08-20 06:11:27,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4689100.0, ans=0.0 2024-08-20 06:11:35,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4689100.0, ans=0.07 2024-08-20 06:11:49,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4689200.0, ans=0.0 2024-08-20 06:11:58,811 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 06:12:00,559 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 06:12:02,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4689300.0, ans=0.1 2024-08-20 06:12:03,402 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9550, loss[loss=0.1029, beats_loss=0.01113, ecapa_loss=0.0001089, whisper_loss=0.09065, over 17635.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001393, whisper_loss=0.08971, over 3814551.97 frames. ], batch size: 67, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:12:14,645 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 33 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 06:12:21,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.271e+01 2.487e+01 2.797e+01 1.341e+02, threshold=4.974e+01, percent-clipped=1.0 2024-08-20 06:12:43,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4689500.0, ans=0.125 2024-08-20 06:13:02,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=8.0 2024-08-20 06:13:07,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4689600.0, ans=0.125 2024-08-20 06:13:16,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4689700.0, ans=0.0 2024-08-20 06:13:23,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4689700.0, ans=0.1 2024-08-20 06:13:29,724 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 06:13:32,315 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9600, loss[loss=0.07114, beats_loss=0.01201, ecapa_loss=0.0001381, whisper_loss=0.05775, over 13442.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001402, whisper_loss=0.08874, over 3796179.11 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:13:32,526 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 06:13:41,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4689800.0, ans=0.2 2024-08-20 06:13:41,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4689800.0, ans=0.125 2024-08-20 06:13:49,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4689900.0, ans=0.125 2024-08-20 06:13:57,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-08-20 06:14:09,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-20 06:14:19,117 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 14 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 06:14:30,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4690100.0, ans=15.0 2024-08-20 06:14:34,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4690100.0, ans=0.035 2024-08-20 06:14:44,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4690100.0, ans=0.125 2024-08-20 06:14:45,614 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 06:14:53,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4690200.0, ans=0.0 2024-08-20 06:14:55,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-20 06:15:06,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4690300.0, ans=0.125 2024-08-20 06:15:06,762 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9650, loss[loss=0.1021, beats_loss=0.01022, ecapa_loss=0.0001603, whisper_loss=0.09027, over 17722.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01061, ecapa_loss=0.0001401, whisper_loss=0.08836, over 3817545.44 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:15:22,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-20 06:15:26,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.779e+01 3.042e+01 4.169e+01, threshold=5.558e+01, percent-clipped=0.0 2024-08-20 06:15:30,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4690400.0, ans=0.2 2024-08-20 06:15:36,634 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-20 06:15:39,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4690400.0, ans=0.1 2024-08-20 06:16:05,275 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 17 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 06:16:20,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4690700.0, ans=0.2 2024-08-20 06:16:23,210 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 06:16:32,795 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9700, loss[loss=0.1115, beats_loss=0.01031, ecapa_loss=0.0001271, whisper_loss=0.09988, over 22932.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01059, ecapa_loss=0.000141, whisper_loss=0.08875, over 3846727.76 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:16:32,956 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 06:16:38,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4690800.0, ans=0.1 2024-08-20 06:16:48,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2024-08-20 06:16:51,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4690900.0, ans=0.125 2024-08-20 06:16:51,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4690900.0, ans=0.125 2024-08-20 06:17:04,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-20 06:17:11,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4691000.0, ans=0.0 2024-08-20 06:17:14,200 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:17:30,394 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 06:17:35,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4691100.0, ans=0.125 2024-08-20 06:17:38,367 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 06:17:50,361 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 06:17:54,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9750, loss[loss=0.09697, beats_loss=0.01156, ecapa_loss=0.000156, whisper_loss=0.08385, over 18518.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001418, whisper_loss=0.08893, over 3836334.11 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:17:57,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-20 06:18:12,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.245e+01 2.617e+01 2.841e+01 5.114e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-20 06:18:24,639 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 18 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 06:18:24,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4691400.0, ans=0.2 2024-08-20 06:18:35,966 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.973e+00 2024-08-20 06:18:42,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4691600.0, ans=0.0 2024-08-20 06:19:11,939 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 06:19:16,750 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9800, loss[loss=0.1012, beats_loss=0.01406, ecapa_loss=0.0001175, whisper_loss=0.08594, over 22303.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.08975, over 3840180.71 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:19:27,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4691800.0, ans=0.125 2024-08-20 06:19:38,060 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 06:20:08,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4692100.0, ans=0.1 2024-08-20 06:20:11,917 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 06:20:14,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.28 vs. limit=10.0 2024-08-20 06:20:39,623 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9850, loss[loss=0.08802, beats_loss=0.01249, ecapa_loss=8.548e-05, whisper_loss=0.07468, over 18671.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001411, whisper_loss=0.08928, over 3803764.96 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:20:39,840 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 06:20:58,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.366e+01 2.568e+01 2.856e+01 6.259e+01, threshold=5.136e+01, percent-clipped=2.0 2024-08-20 06:21:24,722 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 06:21:28,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-20 06:21:32,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4692600.0, ans=0.125 2024-08-20 06:21:34,912 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 11 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 06:21:57,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4692700.0, ans=0.025 2024-08-20 06:22:02,851 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9900, loss[loss=0.09319, beats_loss=0.01271, ecapa_loss=0.0001259, whisper_loss=0.07923, over 21335.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001399, whisper_loss=0.08956, over 3829802.08 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:22:06,619 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 06:22:07,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4692800.0, ans=0.0 2024-08-20 06:22:21,083 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 06:22:28,964 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2024-08-20 06:22:32,404 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 25 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 06:22:32,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4692900.0, ans=0.125 2024-08-20 06:22:34,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4693000.0, ans=0.125 2024-08-20 06:23:02,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4693100.0, ans=0.125 2024-08-20 06:23:24,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4693300.0, ans=0.2 2024-08-20 06:23:24,909 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 9950, loss[loss=0.09291, beats_loss=0.01208, ecapa_loss=0.0001338, whisper_loss=0.0795, over 22154.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001401, whisper_loss=0.08948, over 3871376.78 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:23:40,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4693400.0, ans=0.125 2024-08-20 06:23:42,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.243e+01 2.443e+01 2.710e+01 3.765e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-20 06:24:04,373 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-20 06:24:04,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4693500.0, ans=0.1 2024-08-20 06:24:06,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4693500.0, ans=0.2 2024-08-20 06:24:11,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4693500.0, ans=0.0 2024-08-20 06:24:14,509 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 06:24:23,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4693600.0, ans=0.125 2024-08-20 06:24:23,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-20 06:24:31,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4693700.0, ans=0.125 2024-08-20 06:24:36,492 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 23 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 06:24:42,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4693700.0, ans=0.0 2024-08-20 06:24:49,111 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10000, loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001228, whisper_loss=0.09084, over 22867.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.08981, over 3822072.50 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:24:50,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4693800.0, ans=0.125 2024-08-20 06:24:54,846 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-20 06:25:06,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4693900.0, ans=0.125 2024-08-20 06:25:29,613 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 06:25:31,105 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 15 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 06:25:40,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4694100.0, ans=0.0 2024-08-20 06:25:45,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4694100.0, ans=0.125 2024-08-20 06:25:57,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4694100.0, ans=0.0 2024-08-20 06:25:59,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4694200.0, ans=0.025 2024-08-20 06:26:04,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4694200.0, ans=0.125 2024-08-20 06:26:19,063 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10050, loss[loss=0.1124, beats_loss=0.01088, ecapa_loss=0.0001552, whisper_loss=0.1, over 19517.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.000141, whisper_loss=0.08994, over 3831638.61 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:26:19,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4694300.0, ans=0.1 2024-08-20 06:26:22,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4694300.0, ans=0.0 2024-08-20 06:26:37,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.304e+01 2.607e+01 2.920e+01 4.346e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-20 06:27:27,095 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 29 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 06:27:31,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=12.0 2024-08-20 06:27:47,733 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10100, loss[loss=0.09472, beats_loss=0.01117, ecapa_loss=0.0001271, whisper_loss=0.08228, over 23424.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001421, whisper_loss=0.09001, over 3858155.63 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:27:58,342 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 06:27:59,551 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 06:28:00,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4694800.0, ans=0.125 2024-08-20 06:28:02,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4694800.0, ans=0.0 2024-08-20 06:28:07,018 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 06:28:32,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-20 06:29:00,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4695100.0, ans=0.125 2024-08-20 06:29:18,413 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 06:29:20,728 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 06:29:22,550 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10150, loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001186, whisper_loss=0.0916, over 16820.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001414, whisper_loss=0.08925, over 3844865.52 frames. ], batch size: 66, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:29:44,564 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.187e+01 2.409e+01 2.808e+01 3.836e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-20 06:29:54,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4695400.0, ans=0.125 2024-08-20 06:29:58,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4695400.0, ans=0.0 2024-08-20 06:30:00,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4695400.0, ans=0.0 2024-08-20 06:30:18,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4695500.0, ans=0.125 2024-08-20 06:30:27,200 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 33 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 06:30:32,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4695600.0, ans=0.0 2024-08-20 06:30:41,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4695600.0, ans=0.1 2024-08-20 06:30:50,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4695700.0, ans=0.1 2024-08-20 06:30:52,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4695700.0, ans=0.025 2024-08-20 06:31:02,373 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10200, loss[loss=0.09799, beats_loss=0.009912, ecapa_loss=0.0001464, whisper_loss=0.08661, over 16343.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001409, whisper_loss=0.08912, over 3826085.19 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:31:08,660 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 15 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 06:31:15,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4695800.0, ans=0.1 2024-08-20 06:31:31,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4695900.0, ans=0.125 2024-08-20 06:31:51,542 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2024-08-20 06:32:08,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4696100.0, ans=0.1 2024-08-20 06:32:10,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-20 06:32:15,374 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 06:32:20,244 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-20 06:32:22,939 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 06:32:37,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-20 06:32:39,494 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10250, loss[loss=0.1025, beats_loss=0.008618, ecapa_loss=0.0001403, whisper_loss=0.09246, over 17048.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001409, whisper_loss=0.0888, over 3812789.43 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:32:55,746 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07058558613061905, model_norm_threshold=48.17802047729492 2024-08-20 06:32:55,910 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.542e+04, grad_sumsq=7.542e+04, orig_rms_sq=1.000e+00 2024-08-20 06:33:01,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.230e+01 2.493e+01 2.759e+01 6.825e+02, threshold=4.986e+01, percent-clipped=2.0 2024-08-20 06:33:05,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4696400.0, ans=0.0 2024-08-20 06:33:21,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4696500.0, ans=0.125 2024-08-20 06:33:25,152 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:33:28,188 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 06:33:28,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4696500.0, ans=0.2 2024-08-20 06:33:34,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4696500.0, ans=0.0 2024-08-20 06:33:42,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4696600.0, ans=0.0 2024-08-20 06:33:49,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=22.5 2024-08-20 06:34:02,429 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07055973261594772, model_norm_threshold=49.85801315307617 2024-08-20 06:34:02,593 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.714e+04, grad_sumsq=4.714e+04, orig_rms_sq=1.000e+00 2024-08-20 06:34:22,328 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10300, loss[loss=0.09847, beats_loss=0.009122, ecapa_loss=0.0001539, whisper_loss=0.0878, over 15343.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001419, whisper_loss=0.0887, over 3806131.02 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:34:30,847 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 06:34:49,047 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 06:34:57,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-08-20 06:35:05,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4697000.0, ans=0.125 2024-08-20 06:35:16,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4697000.0, ans=0.05 2024-08-20 06:35:30,864 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-20 06:35:35,878 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 06:35:39,771 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 06:35:45,114 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 06:35:50,431 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 06:35:56,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4697200.0, ans=0.1 2024-08-20 06:36:04,310 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10350, loss[loss=0.1094, beats_loss=0.01233, ecapa_loss=0.0001252, whisper_loss=0.0958, over 18360.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01059, ecapa_loss=0.000142, whisper_loss=0.08825, over 3823755.32 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:36:11,716 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 06:36:27,481 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.261e+01 2.488e+01 2.810e+01 7.066e+02, threshold=4.977e+01, percent-clipped=3.0 2024-08-20 06:36:36,029 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 06:37:10,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4697600.0, ans=0.04949747468305833 2024-08-20 06:37:14,155 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-20 06:37:26,188 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 20 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 06:37:43,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-08-20 06:37:47,017 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10400, loss[loss=0.09956, beats_loss=0.009293, ecapa_loss=0.0001434, whisper_loss=0.08883, over 12890.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08873, over 3837288.26 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:37:57,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4697800.0, ans=0.125 2024-08-20 06:38:05,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4697800.0, ans=0.125 2024-08-20 06:38:12,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4697900.0, ans=0.0 2024-08-20 06:38:14,654 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-20 06:38:16,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4697900.0, ans=0.0 2024-08-20 06:38:18,265 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 06:38:20,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=12.0 2024-08-20 06:38:39,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4698000.0, ans=0.0 2024-08-20 06:38:43,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4698000.0, ans=0.2 2024-08-20 06:39:19,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4698200.0, ans=0.0 2024-08-20 06:39:20,807 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 06:39:31,460 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10450, loss[loss=0.1007, beats_loss=0.009785, ecapa_loss=0.000149, whisper_loss=0.08943, over 22431.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001416, whisper_loss=0.08924, over 3848494.39 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:39:41,886 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 06:39:53,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.263e+01 2.457e+01 2.777e+01 9.468e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-20 06:39:59,927 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 06:40:14,141 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-20 06:40:17,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4698500.0, ans=0.2 2024-08-20 06:40:36,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4698600.0, ans=0.125 2024-08-20 06:40:40,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4698600.0, ans=0.07 2024-08-20 06:40:43,281 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 06:40:44,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4698600.0, ans=0.1 2024-08-20 06:40:48,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4698600.0, ans=0.125 2024-08-20 06:40:48,911 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-08-20 06:40:50,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4698700.0, ans=0.125 2024-08-20 06:40:56,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4698700.0, ans=0.1 2024-08-20 06:41:04,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4698700.0, ans=0.125 2024-08-20 06:41:11,107 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10500, loss[loss=0.09707, beats_loss=0.00984, ecapa_loss=0.000164, whisper_loss=0.08559, over 19217.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.08911, over 3837607.95 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:41:26,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:55,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4699000.0, ans=0.0 2024-08-20 06:41:59,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4699000.0, ans=0.0 2024-08-20 06:42:04,659 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-08-20 06:42:13,085 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-20 06:42:18,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4699100.0, ans=0.0 2024-08-20 06:42:21,303 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 06:42:21,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4699100.0, ans=0.2 2024-08-20 06:42:23,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4699100.0, ans=0.0 2024-08-20 06:42:48,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4699200.0, ans=0.2 2024-08-20 06:42:54,632 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10550, loss[loss=0.1126, beats_loss=0.00967, ecapa_loss=0.00012, whisper_loss=0.1018, over 20365.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.08886, over 3854426.78 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:43:06,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4699300.0, ans=0.1 2024-08-20 06:43:17,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.308e+01 2.564e+01 2.826e+01 3.881e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-20 06:43:21,739 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 25 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 06:43:25,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-20 06:43:29,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4699400.0, ans=0.125 2024-08-20 06:43:55,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-20 06:44:05,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-20 06:44:08,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4699600.0, ans=0.125 2024-08-20 06:44:38,385 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10600, loss[loss=0.1362, beats_loss=0.008627, ecapa_loss=0.0001387, whisper_loss=0.1262, over 23233.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.08894, over 3832556.04 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:44:38,559 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 06:44:56,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4699800.0, ans=0.125 2024-08-20 06:44:58,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4699900.0, ans=0.125 2024-08-20 06:45:12,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4699900.0, ans=0.125 2024-08-20 06:45:15,159 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 06:45:56,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4700100.0, ans=0.0 2024-08-20 06:45:57,647 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 06:45:58,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4700100.0, ans=0.1 2024-08-20 06:46:00,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4700100.0, ans=10.0 2024-08-20 06:46:12,222 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 06:46:16,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4700200.0, ans=0.125 2024-08-20 06:46:24,870 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10650, loss[loss=0.08254, beats_loss=0.01087, ecapa_loss=0.0001488, whisper_loss=0.07018, over 19289.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01056, ecapa_loss=0.0001405, whisper_loss=0.08813, over 3820395.32 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:46:28,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4700300.0, ans=0.0 2024-08-20 06:46:46,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.297e+01 2.515e+01 2.879e+01 5.897e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-20 06:46:53,566 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 15 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 06:47:36,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4700600.0, ans=0.07 2024-08-20 06:48:02,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4700700.0, ans=0.0 2024-08-20 06:48:09,994 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10700, loss[loss=0.0979, beats_loss=0.01124, ecapa_loss=0.0001351, whisper_loss=0.08531, over 20381.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001401, whisper_loss=0.08894, over 3811423.36 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:48:40,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4700900.0, ans=0.1 2024-08-20 06:49:15,438 WARNING [optim.py:496] (2/4) Scaling gradients by 0.023652782663702965, model_norm_threshold=50.29466247558594 2024-08-20 06:49:15,601 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.689e+06, grad_sumsq=1.581e+08, orig_rms_sq=1.068e-02 2024-08-20 06:49:20,846 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 06:49:30,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4701200.0, ans=0.125 2024-08-20 06:49:50,249 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10750, loss[loss=0.08796, beats_loss=0.01274, ecapa_loss=0.0001259, whisper_loss=0.07396, over 12954.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.08954, over 3793013.02 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:49:53,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4701300.0, ans=0.125 2024-08-20 06:49:53,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4701300.0, ans=0.1 2024-08-20 06:50:09,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:50:11,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.335e+01 2.524e+01 2.835e+01 2.126e+03, threshold=5.048e+01, percent-clipped=3.0 2024-08-20 06:50:25,527 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2024-08-20 06:50:39,302 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 14 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 06:50:56,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2024-08-20 06:51:14,920 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 06:51:27,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4701700.0, ans=0.1 2024-08-20 06:51:29,633 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10800, loss[loss=0.08601, beats_loss=0.01112, ecapa_loss=0.0001401, whisper_loss=0.07348, over 16573.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.0001399, whisper_loss=0.08912, over 3839713.16 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:51:36,797 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 21 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 06:51:52,699 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 06:52:13,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4702000.0, ans=0.125 2024-08-20 06:52:20,664 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 21 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-20 06:52:33,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2024-08-20 06:52:43,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4702100.0, ans=0.2 2024-08-20 06:52:46,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2024-08-20 06:52:47,656 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 06:53:08,927 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10850, loss[loss=0.1075, beats_loss=0.01096, ecapa_loss=0.0001375, whisper_loss=0.09518, over 21926.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08895, over 3841135.95 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:53:10,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-20 06:53:30,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.288e+01 2.456e+01 2.756e+01 3.873e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-20 06:53:36,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4702400.0, ans=0.125 2024-08-20 06:53:40,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4702400.0, ans=0.0 2024-08-20 06:54:11,727 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 06:54:14,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2024-08-20 06:54:40,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4702700.0, ans=0.0 2024-08-20 06:54:48,400 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10900, loss[loss=0.08802, beats_loss=0.009433, ecapa_loss=0.0001358, whisper_loss=0.07723, over 15516.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.08937, over 3827768.13 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:54:53,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4702800.0, ans=0.025 2024-08-20 06:55:18,094 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 06:55:25,834 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 06:56:09,476 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 06:56:25,273 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 10950, loss[loss=0.0957, beats_loss=0.01035, ecapa_loss=0.0001327, whisper_loss=0.08402, over 15397.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.08945, over 3811704.74 frames. ], batch size: 62, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:56:47,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.264e+01 2.421e+01 2.646e+01 4.130e+01, threshold=4.843e+01, percent-clipped=0.0 2024-08-20 06:56:51,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2024-08-20 06:56:53,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4703400.0, ans=0.125 2024-08-20 06:56:57,325 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04336608201265335, model_norm_threshold=48.42934799194336 2024-08-20 06:56:57,490 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.187e+05, grad_sumsq=2.051e+07, orig_rms_sq=1.067e-02 2024-08-20 06:57:17,677 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.00 vs. limit=10.0 2024-08-20 06:57:25,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4703600.0, ans=0.125 2024-08-20 06:57:31,149 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 06:57:56,653 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11000, loss[loss=0.1021, beats_loss=0.01147, ecapa_loss=0.0001124, whisper_loss=0.08947, over 19274.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.08977, over 3789844.74 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:57:59,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-08-20 06:58:03,352 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06608612835407257, model_norm_threshold=48.42934799194336 2024-08-20 06:58:03,515 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.920e+04, grad_sumsq=8.920e+04, orig_rms_sq=1.000e+00 2024-08-20 06:58:07,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4703800.0, ans=0.125 2024-08-20 06:58:10,702 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 06:58:14,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4703900.0, ans=0.0 2024-08-20 06:59:16,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2024-08-20 06:59:23,035 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11050, loss[loss=0.09546, beats_loss=0.01014, ecapa_loss=0.0001222, whisper_loss=0.08409, over 20737.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.09026, over 3824630.23 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:59:35,113 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.256e+00 2024-08-20 06:59:43,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.551e+01 2.950e+01 1.117e+03, threshold=5.103e+01, percent-clipped=5.0 2024-08-20 07:00:02,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4704500.0, ans=0.125 2024-08-20 07:00:19,372 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 07:00:22,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-20 07:00:32,886 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 07:00:57,179 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11100, loss[loss=0.1017, beats_loss=0.01021, ecapa_loss=0.0001292, whisper_loss=0.09019, over 22885.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001389, whisper_loss=0.09048, over 3842369.70 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:00:58,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4704800.0, ans=0.125 2024-08-20 07:01:14,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-20 07:01:43,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4705000.0, ans=0.0 2024-08-20 07:01:43,963 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.701e+00 2024-08-20 07:01:50,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4705000.0, ans=0.0 2024-08-20 07:02:11,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-20 07:02:18,781 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-20 07:02:22,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4705200.0, ans=0.2 2024-08-20 07:02:27,300 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 07:02:30,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4705200.0, ans=0.125 2024-08-20 07:02:34,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4705300.0, ans=0.2 2024-08-20 07:02:35,150 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11150, loss[loss=0.1147, beats_loss=0.01067, ecapa_loss=0.0001296, whisper_loss=0.1027, over 22670.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001377, whisper_loss=0.09034, over 3849411.89 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:02:47,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4705300.0, ans=0.125 2024-08-20 07:02:51,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4705300.0, ans=0.0 2024-08-20 07:02:59,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.354e+01 2.488e+01 2.886e+01 1.211e+02, threshold=4.976e+01, percent-clipped=2.0 2024-08-20 07:03:18,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4705500.0, ans=0.0 2024-08-20 07:03:18,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4705500.0, ans=15.0 2024-08-20 07:03:20,181 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:04:01,764 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09045316278934479, model_norm_threshold=49.755611419677734 2024-08-20 07:04:01,928 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.284e+04, grad_sumsq=3.284e+04, orig_rms_sq=1.000e+00 2024-08-20 07:04:20,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4705800.0, ans=0.1 2024-08-20 07:04:21,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11200, loss[loss=0.09867, beats_loss=0.01197, ecapa_loss=0.0001523, whisper_loss=0.08518, over 22489.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001384, whisper_loss=0.08995, over 3856251.86 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:04:24,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4705800.0, ans=0.0 2024-08-20 07:04:57,746 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 07:05:41,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4706200.0, ans=0.125 2024-08-20 07:05:49,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4706200.0, ans=0.2 2024-08-20 07:05:53,032 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 07:06:00,536 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11250, loss[loss=0.1083, beats_loss=0.012, ecapa_loss=7.136e-05, whisper_loss=0.09558, over 16390.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001395, whisper_loss=0.09009, over 3809208.73 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:06:02,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4706300.0, ans=0.125 2024-08-20 07:06:08,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4706300.0, ans=0.125 2024-08-20 07:06:23,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.303e+01 2.565e+01 2.928e+01 5.501e+02, threshold=5.130e+01, percent-clipped=1.0 2024-08-20 07:06:23,406 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 07:06:35,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4706400.0, ans=0.125 2024-08-20 07:06:35,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4706400.0, ans=0.125 2024-08-20 07:06:39,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4706400.0, ans=0.125 2024-08-20 07:06:39,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4706400.0, ans=0.2 2024-08-20 07:06:40,444 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 07:06:46,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4706500.0, ans=0.125 2024-08-20 07:07:08,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:10,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:12,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:14,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:32,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4706700.0, ans=0.125 2024-08-20 07:07:39,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:40,633 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11300, loss[loss=0.09507, beats_loss=0.01072, ecapa_loss=0.0001243, whisper_loss=0.08311, over 19123.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.08996, over 3801920.83 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:07:42,876 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2024-08-20 07:07:50,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:51,647 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 07:07:58,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2024-08-20 07:08:11,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-20 07:08:39,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4707100.0, ans=0.0 2024-08-20 07:08:42,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4707100.0, ans=0.2 2024-08-20 07:09:00,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.00 vs. limit=15.0 2024-08-20 07:09:16,178 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11350, loss[loss=0.08985, beats_loss=0.00974, ecapa_loss=0.0001791, whisper_loss=0.07831, over 18677.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001393, whisper_loss=0.08929, over 3793635.95 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:09:30,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4707300.0, ans=0.125 2024-08-20 07:09:36,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.220e+01 2.467e+01 2.786e+01 5.186e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-20 07:09:41,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4707400.0, ans=0.1 2024-08-20 07:09:48,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4707400.0, ans=0.015 2024-08-20 07:09:53,933 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-20 07:10:00,115 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 07:10:18,205 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 07:10:21,546 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2024-08-20 07:10:26,891 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 07:10:49,671 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11400, loss[loss=0.08779, beats_loss=0.01091, ecapa_loss=0.0001839, whisper_loss=0.07505, over 15893.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.000139, whisper_loss=0.08971, over 3819632.92 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:11:01,358 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 07:11:05,237 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 07:11:15,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-20 07:11:23,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4707900.0, ans=0.0 2024-08-20 07:11:26,682 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 07:11:35,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-20 07:11:43,683 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 07:11:46,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4708100.0, ans=0.125 2024-08-20 07:12:09,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4708200.0, ans=0.125 2024-08-20 07:12:10,577 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 36 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 07:12:23,560 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11450, loss[loss=0.09953, beats_loss=0.0123, ecapa_loss=0.0001071, whisper_loss=0.08617, over 21611.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001386, whisper_loss=0.09032, over 3834053.27 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:12:38,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4708300.0, ans=0.0 2024-08-20 07:12:38,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4708300.0, ans=0.2 2024-08-20 07:12:42,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4708400.0, ans=0.1 2024-08-20 07:12:45,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.318e+01 2.506e+01 2.910e+01 4.315e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-20 07:13:07,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4708500.0, ans=0.0 2024-08-20 07:13:12,374 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 07:13:41,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4708700.0, ans=0.125 2024-08-20 07:13:51,407 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 38 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 07:13:58,853 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11500, loss[loss=0.09958, beats_loss=0.01061, ecapa_loss=0.0001385, whisper_loss=0.08758, over 22610.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.0906, over 3835614.57 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:14:11,778 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 07:14:22,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4708900.0, ans=0.1 2024-08-20 07:14:24,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4708900.0, ans=0.2 2024-08-20 07:14:38,332 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 07:15:10,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4709100.0, ans=0.125 2024-08-20 07:15:13,884 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 07:15:23,903 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 30 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 07:15:39,200 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11550, loss[loss=0.1033, beats_loss=0.009792, ecapa_loss=0.0001346, whisper_loss=0.09217, over 19402.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001388, whisper_loss=0.09056, over 3817183.97 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:15:53,284 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 07:15:54,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4709300.0, ans=0.125 2024-08-20 07:16:01,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.302e+01 2.543e+01 2.812e+01 2.319e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-20 07:16:04,462 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 19 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 07:16:10,835 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 07:17:15,029 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-20 07:17:25,998 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11600, loss[loss=0.1113, beats_loss=0.01108, ecapa_loss=0.0001264, whisper_loss=0.09895, over 23643.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.09006, over 3832761.64 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:17:26,189 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 07:17:32,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4709800.0, ans=0.125 2024-08-20 07:17:44,807 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 07:17:49,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4709900.0, ans=0.125 2024-08-20 07:18:45,854 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-20 07:18:51,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4710200.0, ans=0.125 2024-08-20 07:19:05,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4710200.0, ans=0.125 2024-08-20 07:19:09,676 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11650, loss[loss=0.09244, beats_loss=0.01131, ecapa_loss=0.000128, whisper_loss=0.07985, over 13864.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.09059, over 3818405.85 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:19:11,987 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-20 07:19:23,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4710300.0, ans=0.125 2024-08-20 07:19:24,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-20 07:19:25,228 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 07:19:33,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.347e+01 2.610e+01 2.991e+01 4.037e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-20 07:19:37,703 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 07:19:38,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4710400.0, ans=0.125 2024-08-20 07:19:43,903 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 07:19:48,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4710400.0, ans=0.125 2024-08-20 07:19:55,902 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 07:20:22,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4710600.0, ans=0.125 2024-08-20 07:20:23,899 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 07:20:34,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4710700.0, ans=0.125 2024-08-20 07:20:40,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4710700.0, ans=0.025 2024-08-20 07:20:51,475 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11700, loss[loss=0.08984, beats_loss=0.01318, ecapa_loss=0.0001106, whisper_loss=0.07555, over 21475.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.09069, over 3823003.00 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:21:00,713 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-20 07:21:03,991 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 07:21:16,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4710900.0, ans=0.125 2024-08-20 07:21:31,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4711000.0, ans=0.2 2024-08-20 07:21:35,817 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 11 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 07:21:56,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4711100.0, ans=0.125 2024-08-20 07:21:58,399 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-20 07:22:15,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=22.5 2024-08-20 07:22:32,000 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11750, loss[loss=0.1067, beats_loss=0.009643, ecapa_loss=0.0001348, whisper_loss=0.09575, over 22260.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001393, whisper_loss=0.0914, over 3849346.01 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:22:41,065 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 07:22:50,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4711300.0, ans=0.0 2024-08-20 07:22:55,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.306e+01 2.569e+01 2.913e+01 4.079e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-20 07:23:01,387 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 07:23:28,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4711500.0, ans=0.0 2024-08-20 07:24:12,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4711700.0, ans=0.0 2024-08-20 07:24:18,348 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11800, loss[loss=0.08488, beats_loss=0.01162, ecapa_loss=0.0001295, whisper_loss=0.07196, over 13707.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.09067, over 3837036.81 frames. ], batch size: 54, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:24:34,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-20 07:24:36,780 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 07:24:46,254 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-20 07:25:01,630 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 07:25:09,797 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 16 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-20 07:25:10,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-20 07:25:13,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2024-08-20 07:25:17,968 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 07:25:29,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4712100.0, ans=0.2 2024-08-20 07:25:52,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4712200.0, ans=0.95 2024-08-20 07:26:02,563 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11850, loss[loss=0.1031, beats_loss=0.008826, ecapa_loss=0.0001247, whisper_loss=0.09307, over 13965.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09044, over 3835547.87 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:26:13,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4712300.0, ans=0.125 2024-08-20 07:26:15,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2024-08-20 07:26:21,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4712300.0, ans=0.125 2024-08-20 07:26:26,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.335e+01 2.520e+01 2.883e+01 3.441e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-20 07:26:35,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4712400.0, ans=0.0 2024-08-20 07:26:41,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-20 07:26:47,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4712500.0, ans=0.0 2024-08-20 07:27:17,666 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 07:27:20,099 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 22 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-20 07:27:38,854 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 07:27:46,167 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11900, loss[loss=0.07735, beats_loss=0.01337, ecapa_loss=0.0001257, whisper_loss=0.06272, over 16326.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.08994, over 3819193.69 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:28:14,917 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 07:28:19,221 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 07:28:21,583 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-08-20 07:28:23,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4712900.0, ans=0.05 2024-08-20 07:28:35,154 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 07:29:14,889 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.654e+00 2024-08-20 07:29:16,625 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 07:29:22,079 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 07:29:23,019 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 11950, loss[loss=0.1123, beats_loss=0.009104, ecapa_loss=0.0001582, whisper_loss=0.1016, over 22020.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001419, whisper_loss=0.09061, over 3819296.05 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:29:43,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.385e+01 2.581e+01 2.869e+01 3.753e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 07:29:50,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4713400.0, ans=0.0 2024-08-20 07:30:12,737 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 07:30:24,256 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.076e+00 2024-08-20 07:30:31,503 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 07:30:43,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4713700.0, ans=0.125 2024-08-20 07:30:44,466 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 07:30:44,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4713700.0, ans=0.0 2024-08-20 07:30:50,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4713700.0, ans=0.125 2024-08-20 07:30:57,727 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12000, loss[loss=0.1151, beats_loss=0.01056, ecapa_loss=0.0001044, whisper_loss=0.1035, over 20568.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001418, whisper_loss=0.09033, over 3816279.14 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:30:57,728 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 07:31:33,802 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005087, whisper_loss=0.2481, over 931116.00 frames. 2024-08-20 07:31:56,085 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on SV_voxceleb1: loss=0.003908, beats_loss=0, ecapa_loss=0.0003908, whisper_loss=0, over 944235.00 frames. 2024-08-20 07:33:38,296 INFO [train_multi_KD3.py:1150] (2/4) Epoch 32, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 07:33:38,299 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 07:33:42,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-20 07:34:25,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4714000.0, ans=0.09899494936611666 2024-08-20 07:34:43,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4714100.0, ans=0.125 2024-08-20 07:34:44,285 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 07:34:44,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4714100.0, ans=0.125 2024-08-20 07:34:44,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4714100.0, ans=0.04949747468305833 2024-08-20 07:34:50,526 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.311e+05 2024-08-20 07:35:01,242 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-20 07:35:06,507 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12050, loss[loss=0.08569, beats_loss=0.01088, ecapa_loss=0.0001426, whisper_loss=0.07339, over 16992.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001413, whisper_loss=0.08987, over 3807820.65 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:35:12,261 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4714300.0, ans=0.125 2024-08-20 07:35:14,704 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 07:35:24,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4714400.0, ans=0.125 2024-08-20 07:35:25,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.124e+01 2.460e+01 2.784e+01 4.386e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-20 07:35:25,916 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 07:35:29,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4714400.0, ans=0.05 2024-08-20 07:35:40,429 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 07:35:47,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4714500.0, ans=0.2 2024-08-20 07:35:57,503 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 07:36:38,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4714800.0, ans=0.0 2024-08-20 07:36:39,074 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12100, loss[loss=0.09357, beats_loss=0.01067, ecapa_loss=0.0001583, whisper_loss=0.08132, over 21015.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.09061, over 3837859.00 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:36:42,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4714800.0, ans=0.0 2024-08-20 07:36:54,445 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 07:37:20,641 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 07:38:00,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4715100.0, ans=0.0 2024-08-20 07:38:17,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4715200.0, ans=0.125 2024-08-20 07:38:23,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4715300.0, ans=0.1 2024-08-20 07:38:23,891 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12150, loss[loss=0.1118, beats_loss=0.009515, ecapa_loss=0.0001903, whisper_loss=0.1004, over 17723.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.09048, over 3844239.89 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:38:39,542 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 07:38:41,529 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 07:38:42,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4715300.0, ans=0.1 2024-08-20 07:38:47,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.297e+01 2.567e+01 2.958e+01 5.999e+01, threshold=5.133e+01, percent-clipped=2.0 2024-08-20 07:38:48,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4715400.0, ans=0.125 2024-08-20 07:38:48,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-20 07:39:08,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2024-08-20 07:39:31,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4715600.0, ans=0.125 2024-08-20 07:39:56,351 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 22 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 07:39:58,941 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12200, loss[loss=0.1078, beats_loss=0.009418, ecapa_loss=0.0001267, whisper_loss=0.09716, over 17425.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001427, whisper_loss=0.09064, over 3845802.55 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:40:01,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4715800.0, ans=0.125 2024-08-20 07:40:13,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4715800.0, ans=0.125 2024-08-20 07:40:24,615 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2024-08-20 07:40:35,794 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 07:40:36,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4716000.0, ans=0.1 2024-08-20 07:40:45,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4716000.0, ans=0.125 2024-08-20 07:40:48,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4716000.0, ans=0.0 2024-08-20 07:40:51,276 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 07:40:58,404 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 07:41:22,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4716200.0, ans=0.0 2024-08-20 07:41:25,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12250, loss[loss=0.08411, beats_loss=0.008593, ecapa_loss=0.0001529, whisper_loss=0.07399, over 21588.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001423, whisper_loss=0.08991, over 3816836.03 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:41:26,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4716300.0, ans=0.125 2024-08-20 07:41:42,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4716400.0, ans=0.125 2024-08-20 07:41:44,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4716400.0, ans=0.0 2024-08-20 07:41:45,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.278e+01 2.601e+01 2.929e+01 4.392e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-20 07:41:51,350 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 07:42:00,792 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 07:42:01,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-20 07:42:11,084 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=12.0 2024-08-20 07:42:22,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4716600.0, ans=0.125 2024-08-20 07:42:28,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4716600.0, ans=0.1 2024-08-20 07:42:35,519 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 07:42:46,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4716700.0, ans=0.0 2024-08-20 07:42:50,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4716700.0, ans=0.125 2024-08-20 07:42:53,962 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12300, loss[loss=0.09031, beats_loss=0.01196, ecapa_loss=0.0001459, whisper_loss=0.07689, over 16735.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.08978, over 3803230.70 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:42:58,661 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.584e+00 2024-08-20 07:43:10,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4716900.0, ans=0.125 2024-08-20 07:43:19,379 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 07:43:28,640 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 07:43:49,286 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-08-20 07:43:55,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-20 07:44:14,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-20 07:44:19,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-20 07:44:20,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4717200.0, ans=0.1 2024-08-20 07:44:23,195 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12350, loss[loss=0.1213, beats_loss=0.009265, ecapa_loss=0.0001385, whisper_loss=0.1107, over 22659.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.08944, over 3803388.43 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:44:44,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.339e+01 2.566e+01 2.988e+01 1.086e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-20 07:45:04,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4717500.0, ans=0.125 2024-08-20 07:45:41,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4717700.0, ans=0.125 2024-08-20 07:45:47,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4717700.0, ans=0.0 2024-08-20 07:45:55,826 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12400, loss[loss=0.1456, beats_loss=0.006939, ecapa_loss=0.0001408, whisper_loss=0.1373, over 15763.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08986, over 3837413.26 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:46:10,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4717800.0, ans=0.1 2024-08-20 07:46:12,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4717900.0, ans=0.2 2024-08-20 07:46:17,591 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.30 vs. limit=22.5 2024-08-20 07:46:34,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4718000.0, ans=0.2 2024-08-20 07:46:48,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718100.0, ans=0.1 2024-08-20 07:46:50,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718100.0, ans=0.1 2024-08-20 07:47:04,127 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 07:47:08,371 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4718200.0, ans=0.0 2024-08-20 07:47:22,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4718200.0, ans=0.1 2024-08-20 07:47:24,630 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12450, loss[loss=0.1104, beats_loss=0.009575, ecapa_loss=0.0001699, whisper_loss=0.09911, over 22322.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001411, whisper_loss=0.09017, over 3819288.91 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:47:25,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4718300.0, ans=0.125 2024-08-20 07:47:31,933 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 07:47:43,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.285e+01 2.465e+01 2.724e+01 4.543e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-20 07:48:02,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4718500.0, ans=0.125 2024-08-20 07:48:15,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718500.0, ans=0.1 2024-08-20 07:48:28,461 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-20 07:48:46,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4718700.0, ans=0.125 2024-08-20 07:48:46,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4718700.0, ans=0.0 2024-08-20 07:48:57,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4718800.0, ans=0.125 2024-08-20 07:48:58,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12500, loss[loss=0.08682, beats_loss=0.01227, ecapa_loss=0.0001855, whisper_loss=0.0727, over 20645.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001403, whisper_loss=0.08998, over 3825480.32 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:49:05,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4718800.0, ans=0.125 2024-08-20 07:49:43,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4719000.0, ans=0.125 2024-08-20 07:49:45,268 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-20 07:49:49,584 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 07:49:51,365 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 07:50:04,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4719100.0, ans=0.2 2024-08-20 07:50:27,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4719300.0, ans=0.2 2024-08-20 07:50:28,506 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12550, loss[loss=0.09022, beats_loss=0.01242, ecapa_loss=0.0001254, whisper_loss=0.07655, over 17602.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001407, whisper_loss=0.08901, over 3769053.66 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:50:32,673 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 07:50:48,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.246e+01 2.517e+01 2.953e+01 4.531e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 07:50:55,494 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 21 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-20 07:51:04,183 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 18 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 07:51:04,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4719500.0, ans=0.125 2024-08-20 07:51:16,630 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 07:51:25,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-20 07:51:34,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4719600.0, ans=0.125 2024-08-20 07:51:38,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4719600.0, ans=0.2 2024-08-20 07:51:53,060 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 07:51:59,659 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12600, loss[loss=0.09948, beats_loss=0.01103, ecapa_loss=0.0001074, whisper_loss=0.08737, over 20341.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001409, whisper_loss=0.08913, over 3801028.08 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:52:03,896 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 25 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 07:52:33,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4719900.0, ans=0.125 2024-08-20 07:52:43,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4720000.0, ans=0.0 2024-08-20 07:52:43,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4720000.0, ans=0.5 2024-08-20 07:52:54,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=15.0 2024-08-20 07:53:07,769 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-20 07:53:09,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4720100.0, ans=0.2 2024-08-20 07:53:20,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4720200.0, ans=0.0 2024-08-20 07:53:33,459 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12650, loss[loss=0.1059, beats_loss=0.01001, ecapa_loss=0.000145, whisper_loss=0.09448, over 21828.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.08953, over 3802853.91 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:53:39,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4720300.0, ans=0.0 2024-08-20 07:53:48,657 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 07:53:53,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.384e+01 2.676e+01 2.977e+01 1.190e+02, threshold=5.353e+01, percent-clipped=5.0 2024-08-20 07:54:05,569 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-20 07:54:07,023 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 07:54:08,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4720500.0, ans=0.125 2024-08-20 07:54:23,177 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 07:55:02,705 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12700, loss[loss=0.1538, beats_loss=0.006575, ecapa_loss=0.0001288, whisper_loss=0.1459, over 14520.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.08967, over 3794399.44 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:55:12,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4720800.0, ans=0.0 2024-08-20 07:55:22,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-20 07:55:25,149 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 07:55:30,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4720900.0, ans=0.0 2024-08-20 07:55:43,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-08-20 07:55:48,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4721000.0, ans=0.125 2024-08-20 07:55:56,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4721100.0, ans=0.125 2024-08-20 07:56:00,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4721100.0, ans=0.0 2024-08-20 07:56:04,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-20 07:56:07,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4721100.0, ans=0.125 2024-08-20 07:56:22,467 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-20 07:56:34,115 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12750, loss[loss=0.1096, beats_loss=0.007113, ecapa_loss=0.000127, whisper_loss=0.1012, over 16983.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.08959, over 3805172.03 frames. ], batch size: 62, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:56:36,676 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 07:56:43,662 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 07:56:45,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4721300.0, ans=0.05 2024-08-20 07:56:50,343 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 07:56:52,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.492e+01 2.698e+01 4.644e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 07:57:19,484 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 07:58:02,308 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12800, loss[loss=0.1033, beats_loss=0.01321, ecapa_loss=0.0001129, whisper_loss=0.08893, over 17125.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.08991, over 3794508.58 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:58:19,372 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 07:58:25,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4721900.0, ans=0.0 2024-08-20 07:58:26,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4721900.0, ans=0.0 2024-08-20 07:58:43,740 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 07:59:05,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-08-20 07:59:15,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4722100.0, ans=0.0 2024-08-20 07:59:30,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4722200.0, ans=0.1 2024-08-20 07:59:36,531 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12850, loss[loss=0.09941, beats_loss=0.008366, ecapa_loss=0.0002006, whisper_loss=0.08904, over 18092.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001409, whisper_loss=0.09077, over 3810653.40 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:59:45,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4722300.0, ans=0.125 2024-08-20 07:59:50,382 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 23 from LS+wenet, 16 from Vox, 12 fro AS 2024-08-20 07:59:56,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.213e+01 2.468e+01 2.785e+01 8.515e+01, threshold=4.935e+01, percent-clipped=2.0 2024-08-20 08:00:05,714 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 08:00:18,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4722500.0, ans=0.04949747468305833 2024-08-20 08:00:22,101 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:00:24,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4722500.0, ans=0.09899494936611666 2024-08-20 08:00:44,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4722600.0, ans=0.125 2024-08-20 08:01:01,109 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 08:01:04,943 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12900, loss[loss=0.09612, beats_loss=0.01129, ecapa_loss=0.0001583, whisper_loss=0.08324, over 21704.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.08983, over 3764361.56 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:01:11,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4722800.0, ans=0.0 2024-08-20 08:01:13,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4722800.0, ans=0.1 2024-08-20 08:01:23,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4722900.0, ans=0.125 2024-08-20 08:01:33,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4722900.0, ans=0.1 2024-08-20 08:02:03,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4723100.0, ans=0.0 2024-08-20 08:02:04,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4723100.0, ans=0.05 2024-08-20 08:02:19,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4723200.0, ans=0.0 2024-08-20 08:02:35,022 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 12950, loss[loss=0.09477, beats_loss=0.00768, ecapa_loss=0.0001967, whisper_loss=0.08512, over 14596.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001406, whisper_loss=0.08921, over 3790952.45 frames. ], batch size: 61, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:02:39,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4723300.0, ans=0.0 2024-08-20 08:02:56,158 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.320e+01 2.461e+01 2.898e+01 1.890e+02, threshold=4.922e+01, percent-clipped=4.0 2024-08-20 08:03:18,540 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 08:03:32,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4723600.0, ans=0.2 2024-08-20 08:03:32,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4723600.0, ans=0.1 2024-08-20 08:03:36,553 WARNING [optim.py:496] (2/4) Scaling gradients by 0.012246229685842991, model_norm_threshold=49.221561431884766 2024-08-20 08:03:36,719 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.214e+06, grad_sumsq=3.009e+08, orig_rms_sq=1.068e-02 2024-08-20 08:03:59,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4723700.0, ans=0.125 2024-08-20 08:04:01,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=12.0 2024-08-20 08:04:07,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4723800.0, ans=0.125 2024-08-20 08:04:07,926 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13000, loss[loss=0.09905, beats_loss=0.01216, ecapa_loss=9.31e-05, whisper_loss=0.08596, over 17508.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.000141, whisper_loss=0.08987, over 3803620.81 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:04:34,545 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 08:04:50,751 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 08:05:03,279 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 08:05:05,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4724100.0, ans=0.0 2024-08-20 08:05:07,792 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 08:05:20,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4724100.0, ans=0.0 2024-08-20 08:05:31,409 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 28 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 08:05:39,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-20 08:05:41,807 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13050, loss[loss=0.09085, beats_loss=0.01158, ecapa_loss=0.0001244, whisper_loss=0.07802, over 22571.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001411, whisper_loss=0.08981, over 3784421.52 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:05:45,790 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03136618062853813, model_norm_threshold=49.221561431884766 2024-08-20 08:05:45,954 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.784e+05, grad_sumsq=1.450e+05, orig_rms_sq=3.300e+00 2024-08-20 08:05:46,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.61 vs. limit=10.0 2024-08-20 08:05:58,248 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 08:06:03,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.318e+01 2.513e+01 2.841e+01 4.019e+03, threshold=5.026e+01, percent-clipped=3.0 2024-08-20 08:06:26,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4724500.0, ans=0.125 2024-08-20 08:06:32,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4724500.0, ans=0.0 2024-08-20 08:07:06,607 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-20 08:07:11,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4724700.0, ans=0.125 2024-08-20 08:07:13,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=22.5 2024-08-20 08:07:13,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-08-20 08:07:21,155 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13100, loss[loss=0.1012, beats_loss=0.008503, ecapa_loss=0.0001748, whisper_loss=0.09092, over 17791.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.000142, whisper_loss=0.08908, over 3798900.33 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:07:23,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4724800.0, ans=0.0 2024-08-20 08:07:37,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4724900.0, ans=0.2 2024-08-20 08:07:39,172 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 08:07:47,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4724900.0, ans=0.2 2024-08-20 08:07:47,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4724900.0, ans=0.07 2024-08-20 08:07:56,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4725000.0, ans=0.0 2024-08-20 08:08:02,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-20 08:08:12,029 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 08:08:28,423 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 08:08:54,637 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13150, loss[loss=0.09796, beats_loss=0.009945, ecapa_loss=0.000136, whisper_loss=0.08665, over 23036.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01059, ecapa_loss=0.0001417, whisper_loss=0.08853, over 3764759.85 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:09:04,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4725300.0, ans=0.125 2024-08-20 08:09:16,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+01 2.265e+01 2.500e+01 2.860e+01 8.543e+01, threshold=5.000e+01, percent-clipped=2.0 2024-08-20 08:09:36,715 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 08:09:50,522 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:09:56,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4725600.0, ans=0.0 2024-08-20 08:09:58,797 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 08:10:03,426 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2024-08-20 08:10:23,730 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 34 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 08:10:31,884 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13200, loss[loss=0.1236, beats_loss=0.008313, ecapa_loss=0.0001361, whisper_loss=0.1139, over 20650.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.08995, over 3794732.10 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:10:55,855 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-20 08:11:19,714 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 08:11:29,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4726100.0, ans=0.0 2024-08-20 08:11:31,430 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 08:11:53,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4726200.0, ans=0.1 2024-08-20 08:12:05,809 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13250, loss[loss=0.08932, beats_loss=0.01195, ecapa_loss=0.0001302, whisper_loss=0.07607, over 21888.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.0896, over 3804314.33 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:12:09,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4726300.0, ans=0.125 2024-08-20 08:12:26,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.372e+01 2.599e+01 3.015e+01 7.004e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-20 08:12:42,875 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 08:12:45,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4726500.0, ans=0.125 2024-08-20 08:12:58,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4726500.0, ans=0.125 2024-08-20 08:13:04,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4726600.0, ans=0.0 2024-08-20 08:13:15,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-08-20 08:13:17,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4726600.0, ans=0.0 2024-08-20 08:13:18,761 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 08:13:33,678 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 08:13:39,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4726800.0, ans=0.0 2024-08-20 08:13:40,489 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13300, loss[loss=0.1213, beats_loss=0.008268, ecapa_loss=9.339e-05, whisper_loss=0.1121, over 15131.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.000141, whisper_loss=0.08932, over 3797528.15 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:13:43,782 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-20 08:14:02,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4726900.0, ans=0.125 2024-08-20 08:14:10,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4726900.0, ans=0.1 2024-08-20 08:14:10,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-20 08:14:21,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4727000.0, ans=0.95 2024-08-20 08:15:00,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4727200.0, ans=0.1 2024-08-20 08:15:05,266 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 08:15:14,035 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13350, loss[loss=0.09635, beats_loss=0.01239, ecapa_loss=0.0001203, whisper_loss=0.08276, over 19804.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.09007, over 3768921.81 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:15:28,019 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 08:15:34,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.328e+01 2.528e+01 2.746e+01 3.166e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-20 08:15:40,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4727400.0, ans=0.1 2024-08-20 08:15:41,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-20 08:15:44,623 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4727400.0, ans=0.125 2024-08-20 08:15:57,561 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:15:57,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4727500.0, ans=0.0 2024-08-20 08:16:03,979 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 34 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-20 08:16:19,393 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2024-08-20 08:16:41,162 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 08:16:46,251 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13400, loss[loss=0.093, beats_loss=0.0111, ecapa_loss=0.0001717, whisper_loss=0.08018, over 22395.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.09014, over 3762490.23 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:16:54,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4727800.0, ans=0.125 2024-08-20 08:17:17,561 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 08:17:17,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4727900.0, ans=0.125 2024-08-20 08:17:19,369 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 08:17:22,780 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 08:17:26,930 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-20 08:17:43,131 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-20 08:17:50,740 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 27 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 08:17:58,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4728200.0, ans=0.0 2024-08-20 08:18:05,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4728200.0, ans=0.0 2024-08-20 08:18:07,243 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 08:18:17,770 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13450, loss[loss=0.1019, beats_loss=0.008344, ecapa_loss=0.0001581, whisper_loss=0.09195, over 14897.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01029, ecapa_loss=0.0001414, whisper_loss=0.08973, over 3743029.89 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:18:39,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.294e+01 2.576e+01 2.808e+01 3.727e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-20 08:18:57,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4728500.0, ans=0.0 2024-08-20 08:18:57,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4728500.0, ans=0.0 2024-08-20 08:19:20,882 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 08:19:51,639 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13500, loss[loss=0.1059, beats_loss=0.009758, ecapa_loss=0.0001306, whisper_loss=0.09479, over 23224.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.08948, over 3726573.25 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:20:13,578 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 08:21:26,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-20 08:21:29,959 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 08:21:31,020 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13550, loss[loss=0.1049, beats_loss=0.01163, ecapa_loss=0.0001263, whisper_loss=0.09203, over 17135.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001396, whisper_loss=0.0898, over 3770367.42 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:21:36,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-20 08:21:52,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.276e+01 2.469e+01 2.814e+01 4.223e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-20 08:21:56,332 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 08:21:56,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4729400.0, ans=0.125 2024-08-20 08:22:10,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-20 08:22:11,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4729500.0, ans=0.1 2024-08-20 08:22:21,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4729500.0, ans=0.2 2024-08-20 08:22:28,062 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 08:22:48,052 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 08:23:05,601 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13600, loss[loss=0.09626, beats_loss=0.01157, ecapa_loss=0.0001259, whisper_loss=0.08343, over 16471.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.09009, over 3764759.95 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:23:31,469 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-20 08:23:53,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4730000.0, ans=0.1 2024-08-20 08:23:56,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4730000.0, ans=0.125 2024-08-20 08:24:02,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4730100.0, ans=0.2 2024-08-20 08:24:07,762 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 08:24:26,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2024-08-20 08:24:42,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4730300.0, ans=0.2 2024-08-20 08:24:43,164 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13650, loss[loss=0.09807, beats_loss=0.01133, ecapa_loss=0.0001449, whisper_loss=0.08528, over 20497.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001409, whisper_loss=0.08995, over 3781481.43 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:24:57,484 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 08:25:03,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.366e+01 2.588e+01 2.806e+01 4.523e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-20 08:25:17,188 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 18 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-20 08:25:21,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4730500.0, ans=0.07 2024-08-20 08:25:29,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4730500.0, ans=0.125 2024-08-20 08:25:30,139 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-20 08:25:53,880 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-20 08:26:15,097 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13700, loss[loss=0.09868, beats_loss=0.01198, ecapa_loss=0.0001048, whisper_loss=0.08565, over 14768.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.08901, over 3777266.65 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:26:17,314 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 08:26:20,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4730800.0, ans=0.0 2024-08-20 08:26:35,557 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 08:26:43,346 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 08:26:46,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=12.0 2024-08-20 08:26:52,416 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 08:27:03,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4731000.0, ans=10.0 2024-08-20 08:27:12,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4731100.0, ans=0.0 2024-08-20 08:27:17,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4731100.0, ans=0.2 2024-08-20 08:27:22,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4731100.0, ans=0.0 2024-08-20 08:27:24,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4731100.0, ans=0.2 2024-08-20 08:27:37,938 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 08:27:46,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4731200.0, ans=0.125 2024-08-20 08:27:48,702 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13750, loss[loss=0.1119, beats_loss=0.01189, ecapa_loss=0.0001117, whisper_loss=0.09885, over 22176.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001402, whisper_loss=0.08917, over 3792187.76 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:27:52,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-20 08:27:52,902 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 08:27:56,841 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 08:27:59,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4731300.0, ans=0.1 2024-08-20 08:28:00,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4731300.0, ans=0.125 2024-08-20 08:28:05,599 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 25 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 08:28:10,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.242e+01 2.511e+01 2.832e+01 4.850e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 08:28:22,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4731400.0, ans=0.2 2024-08-20 08:28:28,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4731500.0, ans=0.0 2024-08-20 08:28:29,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4731500.0, ans=0.125 2024-08-20 08:28:34,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4731500.0, ans=0.0 2024-08-20 08:29:13,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4731700.0, ans=0.0 2024-08-20 08:29:15,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4731700.0, ans=0.125 2024-08-20 08:29:19,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4731700.0, ans=0.125 2024-08-20 08:29:22,250 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13800, loss[loss=0.08482, beats_loss=0.01231, ecapa_loss=0.000172, whisper_loss=0.07079, over 21342.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.08958, over 3820035.84 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:29:50,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4731900.0, ans=10.0 2024-08-20 08:30:11,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4732000.0, ans=0.125 2024-08-20 08:30:46,930 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 08:30:54,832 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13850, loss[loss=0.1127, beats_loss=0.008828, ecapa_loss=0.0001392, whisper_loss=0.1025, over 14433.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01035, ecapa_loss=0.0001402, whisper_loss=0.08957, over 3803641.49 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:31:11,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-20 08:31:15,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.631e+01 2.257e+01 2.391e+01 2.623e+01 3.979e+01, threshold=4.782e+01, percent-clipped=0.0 2024-08-20 08:31:49,006 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:32:03,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2024-08-20 08:32:08,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4732700.0, ans=0.1 2024-08-20 08:32:26,083 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13900, loss[loss=0.1241, beats_loss=0.008307, ecapa_loss=0.0001377, whisper_loss=0.1144, over 23020.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001408, whisper_loss=0.0896, over 3791771.15 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:32:26,727 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 33 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 08:32:28,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4732800.0, ans=0.2 2024-08-20 08:32:36,977 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 08:33:00,357 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 08:33:02,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4733000.0, ans=0.125 2024-08-20 08:33:15,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-08-20 08:33:27,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4733100.0, ans=0.125 2024-08-20 08:33:35,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4733100.0, ans=0.0 2024-08-20 08:33:55,680 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 13950, loss[loss=0.09752, beats_loss=0.01071, ecapa_loss=0.0001517, whisper_loss=0.08529, over 22615.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.09029, over 3791485.72 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:33:58,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4733300.0, ans=0.0 2024-08-20 08:34:07,903 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 08:34:11,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4733300.0, ans=0.2 2024-08-20 08:34:17,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.254e+01 2.490e+01 2.803e+01 3.566e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 08:34:25,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4733400.0, ans=15.0 2024-08-20 08:34:56,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4733600.0, ans=0.125 2024-08-20 08:35:02,690 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 08:35:15,109 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 08:35:26,933 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14000, loss[loss=0.111, beats_loss=0.01142, ecapa_loss=0.0001346, whisper_loss=0.09822, over 17737.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.09056, over 3819286.94 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:35:49,498 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 08:36:17,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4734000.0, ans=0.125 2024-08-20 08:36:35,883 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-20 08:36:37,721 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 08:36:39,693 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 08:37:00,704 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14050, loss[loss=0.1089, beats_loss=0.01199, ecapa_loss=0.0001366, whisper_loss=0.09556, over 20116.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09099, over 3794394.39 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:37:22,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.283e+01 2.494e+01 2.922e+01 5.594e+01, threshold=4.987e+01, percent-clipped=2.0 2024-08-20 08:37:40,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4734500.0, ans=0.05 2024-08-20 08:37:46,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=4734500.0, ans=0.1 2024-08-20 08:37:54,221 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4734500.0, ans=0.1 2024-08-20 08:38:16,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4734700.0, ans=0.1 2024-08-20 08:38:31,622 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14100, loss[loss=0.09261, beats_loss=0.01323, ecapa_loss=0.0001166, whisper_loss=0.07821, over 23407.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001397, whisper_loss=0.09079, over 3799426.91 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:38:48,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4734900.0, ans=0.0 2024-08-20 08:38:55,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4734900.0, ans=0.125 2024-08-20 08:39:05,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4734900.0, ans=0.0 2024-08-20 08:39:18,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4735000.0, ans=0.0 2024-08-20 08:39:37,240 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 08:39:46,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-20 08:39:55,690 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-20 08:39:57,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-20 08:40:01,884 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14150, loss[loss=0.08468, beats_loss=0.01127, ecapa_loss=0.000121, whisper_loss=0.07219, over 19097.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.09056, over 3779327.71 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:40:02,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4735300.0, ans=0.035 2024-08-20 08:40:21,757 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 08:40:22,947 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.235e+01 2.465e+01 2.825e+01 7.434e+01, threshold=4.929e+01, percent-clipped=1.0 2024-08-20 08:40:44,981 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 08:41:03,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4735600.0, ans=0.2 2024-08-20 08:41:20,437 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 08:41:30,422 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14200, loss[loss=0.1023, beats_loss=0.009923, ecapa_loss=0.0001298, whisper_loss=0.0911, over 22201.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.09098, over 3770274.78 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:41:32,245 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 15 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-20 08:41:36,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4735800.0, ans=0.125 2024-08-20 08:42:28,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-08-20 08:42:32,609 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 08:42:36,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4736100.0, ans=0.2 2024-08-20 08:42:45,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4736200.0, ans=0.0 2024-08-20 08:42:54,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4736200.0, ans=0.2 2024-08-20 08:43:02,143 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14250, loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001196, whisper_loss=0.09174, over 14357.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001394, whisper_loss=0.09081, over 3789352.11 frames. ], batch size: 54, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:43:08,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4736300.0, ans=0.05 2024-08-20 08:43:09,821 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 08:43:24,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.313e+01 2.520e+01 2.754e+01 4.470e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-20 08:43:37,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4736400.0, ans=0.125 2024-08-20 08:44:04,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4736600.0, ans=0.125 2024-08-20 08:44:06,380 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 08:44:14,974 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 08:44:18,122 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 08:44:35,088 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14300, loss[loss=0.07343, beats_loss=0.01216, ecapa_loss=0.0001576, whisper_loss=0.05969, over 20285.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001397, whisper_loss=0.09085, over 3794733.74 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:44:35,731 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-20 08:44:38,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4736800.0, ans=0.125 2024-08-20 08:44:45,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4736800.0, ans=0.125 2024-08-20 08:44:46,071 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-20 08:45:05,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4736900.0, ans=0.1 2024-08-20 08:45:05,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4736900.0, ans=10.0 2024-08-20 08:45:07,485 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 34 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 08:45:11,536 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 14 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-20 08:45:11,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4737000.0, ans=0.125 2024-08-20 08:45:19,553 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 08:45:23,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-08-20 08:45:35,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4737100.0, ans=0.1 2024-08-20 08:45:38,362 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 08:45:45,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4737100.0, ans=0.2 2024-08-20 08:45:49,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4737200.0, ans=0.0 2024-08-20 08:46:05,493 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14350, loss[loss=0.1045, beats_loss=0.01009, ecapa_loss=0.0001688, whisper_loss=0.09275, over 20624.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001409, whisper_loss=0.09015, over 3809345.36 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:46:26,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.359e+01 2.648e+01 3.006e+01 2.772e+02, threshold=5.296e+01, percent-clipped=2.0 2024-08-20 08:46:28,909 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 16 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 08:46:30,579 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 08:46:46,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4737500.0, ans=0.125 2024-08-20 08:46:48,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4737500.0, ans=0.2 2024-08-20 08:47:05,711 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.92 vs. limit=10.0 2024-08-20 08:47:32,699 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14400, loss[loss=0.0994, beats_loss=0.0119, ecapa_loss=0.0001475, whisper_loss=0.08603, over 14365.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.09103, over 3809395.54 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:47:45,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=22.5 2024-08-20 08:47:45,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-20 08:47:46,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4737800.0, ans=0.0 2024-08-20 08:47:49,780 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 08:47:49,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4737900.0, ans=0.0 2024-08-20 08:48:19,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-20 08:48:28,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4738100.0, ans=0.05 2024-08-20 08:48:33,518 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 08:48:44,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4738200.0, ans=0.125 2024-08-20 08:48:56,908 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 08:48:58,866 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 08:49:03,615 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14450, loss[loss=0.06919, beats_loss=0.01375, ecapa_loss=9.47e-05, whisper_loss=0.0545, over 17265.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01036, ecapa_loss=0.0001403, whisper_loss=0.09136, over 3795832.69 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:49:07,325 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 08:49:09,870 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-20 08:49:13,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4738300.0, ans=0.125 2024-08-20 08:49:24,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.293e+01 2.479e+01 2.732e+01 7.579e+01, threshold=4.957e+01, percent-clipped=1.0 2024-08-20 08:49:27,829 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 08:49:33,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4738400.0, ans=0.0 2024-08-20 08:49:59,658 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 08:50:23,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-20 08:50:34,412 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 08:50:36,050 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 08:50:37,004 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14500, loss[loss=0.1124, beats_loss=0.01118, ecapa_loss=0.0001375, whisper_loss=0.09988, over 23552.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.09145, over 3829181.16 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:50:44,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4738800.0, ans=0.2 2024-08-20 08:51:02,036 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-20 08:51:05,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4738900.0, ans=0.125 2024-08-20 08:51:46,576 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 36 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 08:51:50,892 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 08:51:52,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4739200.0, ans=0.125 2024-08-20 08:51:58,063 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4739200.0, ans=0.125 2024-08-20 08:52:03,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4739200.0, ans=0.1 2024-08-20 08:52:11,787 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14550, loss[loss=0.08545, beats_loss=0.009921, ecapa_loss=0.0001333, whisper_loss=0.07419, over 18173.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.09086, over 3839695.20 frames. ], batch size: 75, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:52:18,243 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 08:52:24,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4739300.0, ans=0.04949747468305833 2024-08-20 08:52:34,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.256e+01 2.477e+01 2.723e+01 4.705e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 08:52:41,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=4739400.0, ans=0.02 2024-08-20 08:52:41,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4739400.0, ans=0.2 2024-08-20 08:53:06,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4739600.0, ans=0.0 2024-08-20 08:53:09,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-08-20 08:53:10,287 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-20 08:53:39,374 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 08:53:41,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4739700.0, ans=0.015 2024-08-20 08:53:44,327 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14600, loss[loss=0.106, beats_loss=0.008507, ecapa_loss=0.000135, whisper_loss=0.0961, over 16979.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.09057, over 3857828.02 frames. ], batch size: 66, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:54:01,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4739800.0, ans=0.125 2024-08-20 08:54:04,648 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 42 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 08:54:25,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4740000.0, ans=0.125 2024-08-20 08:54:34,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4740000.0, ans=0.2 2024-08-20 08:54:47,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4740100.0, ans=0.125 2024-08-20 08:54:47,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4740100.0, ans=0.2 2024-08-20 08:55:00,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-20 08:55:02,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-20 08:55:12,526 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 08:55:16,418 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14650, loss[loss=0.1133, beats_loss=0.01068, ecapa_loss=0.0001457, whisper_loss=0.1012, over 17319.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.000138, whisper_loss=0.09014, over 3889667.99 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:55:22,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4740300.0, ans=10.0 2024-08-20 08:55:27,780 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 08:55:38,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.329e+01 2.529e+01 2.848e+01 4.887e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 08:55:47,979 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:55:53,553 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 08:56:14,521 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 24 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 08:56:18,723 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:56:41,924 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 08:56:45,540 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14700, loss[loss=0.1141, beats_loss=0.009364, ecapa_loss=0.000163, whisper_loss=0.1032, over 14515.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001394, whisper_loss=0.08963, over 3869744.81 frames. ], batch size: 56, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:56:50,349 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 08:56:57,141 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 08:56:59,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4740800.0, ans=0.125 2024-08-20 08:57:25,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-20 08:57:30,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4741000.0, ans=0.125 2024-08-20 08:57:33,903 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 08:57:53,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4741100.0, ans=0.0 2024-08-20 08:57:55,481 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-20 08:57:57,104 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 08:58:15,523 INFO [train_multi_KD3.py:1117] (2/4) Epoch 32, batch 14750, loss[loss=0.1009, beats_loss=0.01156, ecapa_loss=0.0001588, whisper_loss=0.08778, over 21567.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001392, whisper_loss=0.08936, over 3839924.25 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:58:36,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.385e+01 2.604e+01 3.059e+01 5.323e+01, threshold=5.208e+01, percent-clipped=1.0 2024-08-20 08:59:16,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4741600.0, ans=0.125 2024-08-20 08:59:29,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4741700.0, ans=0.0 2024-08-20 08:59:31,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.91 vs. limit=10.0 2024-08-20 09:00:12,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4741780.0, ans=0.125 2024-08-20 09:00:13,081 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 0, loss[loss=0.09429, beats_loss=0.009389, ecapa_loss=0.0001559, whisper_loss=0.08334, over 21114.00 frames. ], tot_loss[loss=0.09429, beats_loss=0.009389, ecapa_loss=0.0001559, whisper_loss=0.08334, over 21114.00 frames. ], batch size: 87, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:00:13,081 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 09:00:48,154 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005003, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 09:01:01,688 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8488, 4.4241, 4.7199, 4.7868], device='cuda:2') 2024-08-20 09:01:09,274 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on SV_voxceleb1: loss=0.003963, beats_loss=0, ecapa_loss=0.0003963, whisper_loss=0, over 944235.00 frames. 2024-08-20 09:01:54,973 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.9182, 2.0433, 2.0834, 1.9698, 2.4935, 1.9808, 2.1625, 1.9459], device='cuda:2') 2024-08-20 09:02:40,866 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9969, 1.8005, 1.9032, 1.3140, 1.5987, 2.0462, 2.5423, 1.5133], device='cuda:2') 2024-08-20 09:02:51,320 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 09:02:51,322 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 09:03:11,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4741780.0, ans=0.1 2024-08-20 09:03:32,069 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4741880.0, ans=0.07 2024-08-20 09:03:42,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4741980.0, ans=0.2 2024-08-20 09:03:51,706 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 09:03:59,217 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4741980.0, ans=10.0 2024-08-20 09:04:11,232 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 21 from LS+wenet, 6 from Vox, 22 fro AS 2024-08-20 09:04:15,552 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 09:04:16,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4742080.0, ans=0.125 2024-08-20 09:04:52,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-20 09:04:55,977 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 09:04:57,994 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 50, loss[loss=0.1063, beats_loss=0.009964, ecapa_loss=0.0001139, whisper_loss=0.09517, over 19683.00 frames. ], tot_loss[loss=0.09828, beats_loss=0.009381, ecapa_loss=0.0001482, whisper_loss=0.08742, over 862957.69 frames. ], batch size: 75, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:05:04,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4742280.0, ans=0.0 2024-08-20 09:05:26,904 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 09:05:28,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4742380.0, ans=0.0 2024-08-20 09:05:31,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.495e+01 2.772e+01 3.142e+01 4.372e+01, threshold=5.543e+01, percent-clipped=0.0 2024-08-20 09:06:53,974 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 100, loss[loss=0.08395, beats_loss=0.008845, ecapa_loss=0.0001717, whisper_loss=0.07339, over 16906.00 frames. ], tot_loss[loss=0.09791, beats_loss=0.009258, ecapa_loss=0.0001472, whisper_loss=0.08718, over 1471083.55 frames. ], batch size: 69, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:07:34,751 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4742880.0, ans=0.0 2024-08-20 09:07:40,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4742980.0, ans=0.125 2024-08-20 09:07:53,721 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 09:07:56,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4742980.0, ans=0.125 2024-08-20 09:08:30,589 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 09:08:33,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4743180.0, ans=0.2 2024-08-20 09:08:43,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4743280.0, ans=0.1 2024-08-20 09:08:43,848 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 150, loss[loss=0.109, beats_loss=0.009648, ecapa_loss=0.0001325, whisper_loss=0.09801, over 17507.00 frames. ], tot_loss[loss=0.09861, beats_loss=0.009187, ecapa_loss=0.000146, whisper_loss=0.08796, over 1976805.08 frames. ], batch size: 69, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:08:55,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4743280.0, ans=10.0 2024-08-20 09:09:08,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4743380.0, ans=0.1 2024-08-20 09:09:10,284 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 09:09:11,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.463e+01 2.692e+01 3.124e+01 4.669e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-20 09:09:21,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4743480.0, ans=0.2 2024-08-20 09:09:25,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4743480.0, ans=0.1 2024-08-20 09:09:26,394 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 21 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 09:09:27,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4743480.0, ans=0.125 2024-08-20 09:09:28,777 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 09:09:39,953 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 09:09:42,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4743580.0, ans=0.125 2024-08-20 09:10:17,686 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 200, loss[loss=0.09693, beats_loss=0.009647, ecapa_loss=0.0001463, whisper_loss=0.08582, over 20655.00 frames. ], tot_loss[loss=0.09938, beats_loss=0.009407, ecapa_loss=0.0001443, whisper_loss=0.08853, over 2356427.53 frames. ], batch size: 81, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:10:27,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4743780.0, ans=0.2 2024-08-20 09:10:37,719 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 09:10:50,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4743880.0, ans=0.125 2024-08-20 09:10:52,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4743980.0, ans=0.2 2024-08-20 09:10:54,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-20 09:11:04,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4743980.0, ans=0.0 2024-08-20 09:11:13,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4744080.0, ans=0.07 2024-08-20 09:11:18,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4744080.0, ans=0.125 2024-08-20 09:11:30,127 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 09:11:35,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4744180.0, ans=0.125 2024-08-20 09:11:36,897 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-20 09:11:43,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4744180.0, ans=0.125 2024-08-20 09:11:43,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4744180.0, ans=0.0 2024-08-20 09:11:45,344 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 250, loss[loss=0.097, beats_loss=0.01176, ecapa_loss=0.0001094, whisper_loss=0.08415, over 15139.00 frames. ], tot_loss[loss=0.09939, beats_loss=0.009776, ecapa_loss=0.0001409, whisper_loss=0.08821, over 2664257.71 frames. ], batch size: 56, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:12:06,075 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-20 09:12:09,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.364e+01 2.600e+01 2.936e+01 1.943e+02, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 09:12:15,469 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 09:12:42,904 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-20 09:12:55,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4744680.0, ans=0.2 2024-08-20 09:13:00,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4744680.0, ans=0.125 2024-08-20 09:13:03,127 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-08-20 09:13:13,944 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 300, loss[loss=0.09822, beats_loss=0.00935, ecapa_loss=0.0001356, whisper_loss=0.08752, over 17535.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009949, ecapa_loss=0.0001397, whisper_loss=0.08893, over 2897959.34 frames. ], batch size: 68, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:13:17,376 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 09:13:21,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4744780.0, ans=0.125 2024-08-20 09:13:52,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4744980.0, ans=0.125 2024-08-20 09:14:05,014 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 18 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 09:14:09,655 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-08-20 09:14:34,822 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 14 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 09:14:43,501 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 350, loss[loss=0.1019, beats_loss=0.01165, ecapa_loss=0.0001114, whisper_loss=0.08918, over 20058.00 frames. ], tot_loss[loss=0.09951, beats_loss=0.0102, ecapa_loss=0.0001396, whisper_loss=0.08791, over 3062973.02 frames. ], batch size: 77, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:15:01,258 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 09:15:08,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.237e+01 2.517e+01 2.824e+01 3.334e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 09:15:18,929 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 09:16:09,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4745680.0, ans=0.0 2024-08-20 09:16:15,548 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 400, loss[loss=0.1035, beats_loss=0.009888, ecapa_loss=0.0001753, whisper_loss=0.09187, over 17557.00 frames. ], tot_loss[loss=0.09985, beats_loss=0.01022, ecapa_loss=0.0001394, whisper_loss=0.08824, over 3224826.76 frames. ], batch size: 68, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:16:19,132 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 39 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 09:16:23,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4745780.0, ans=0.125 2024-08-20 09:16:23,975 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4745780.0, ans=0.1 2024-08-20 09:16:36,693 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 09:16:49,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4745880.0, ans=0.04949747468305833 2024-08-20 09:16:52,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4745980.0, ans=0.0 2024-08-20 09:17:17,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4746080.0, ans=0.2 2024-08-20 09:17:19,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4746080.0, ans=0.125 2024-08-20 09:17:26,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4746080.0, ans=0.0 2024-08-20 09:17:31,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4746180.0, ans=0.0 2024-08-20 09:17:32,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4746180.0, ans=0.1 2024-08-20 09:17:35,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4746180.0, ans=0.125 2024-08-20 09:17:47,691 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 450, loss[loss=0.111, beats_loss=0.009146, ecapa_loss=0.0001625, whisper_loss=0.1002, over 20267.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01028, ecapa_loss=0.0001394, whisper_loss=0.08821, over 3360307.72 frames. ], batch size: 82, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:18:12,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.270e+01 2.468e+01 2.712e+01 4.275e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-20 09:18:24,133 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 15 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 09:18:30,662 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 09:18:36,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-20 09:19:18,907 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 500, loss[loss=0.1085, beats_loss=0.00856, ecapa_loss=0.000139, whisper_loss=0.09851, over 20157.00 frames. ], tot_loss[loss=0.09955, beats_loss=0.01021, ecapa_loss=0.0001403, whisper_loss=0.08794, over 3434503.68 frames. ], batch size: 79, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:19:19,107 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 09:19:45,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4746880.0, ans=0.0 2024-08-20 09:20:05,524 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 09:20:23,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4747080.0, ans=15.0 2024-08-20 09:20:27,072 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 09:20:50,229 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 550, loss[loss=0.1031, beats_loss=0.009343, ecapa_loss=0.0001627, whisper_loss=0.09209, over 20252.00 frames. ], tot_loss[loss=0.09985, beats_loss=0.01014, ecapa_loss=0.0001408, whisper_loss=0.0883, over 3478108.98 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:20:50,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4747280.0, ans=0.0 2024-08-20 09:21:05,945 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 10 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 09:21:14,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.517e+01 2.843e+01 4.116e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-20 09:21:26,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4747480.0, ans=0.0 2024-08-20 09:21:35,191 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 37 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 09:21:40,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-20 09:21:44,031 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 39 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 09:21:45,179 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 09:21:53,273 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 09:22:22,668 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 600, loss[loss=0.092, beats_loss=0.01211, ecapa_loss=0.0001037, whisper_loss=0.07885, over 24295.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01017, ecapa_loss=0.0001404, whisper_loss=0.08822, over 3527878.19 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:22:39,658 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 12 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 09:22:55,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4747880.0, ans=10.0 2024-08-20 09:23:24,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4748080.0, ans=0.125 2024-08-20 09:23:24,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-20 09:23:25,006 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 09:23:34,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4748180.0, ans=0.05 2024-08-20 09:23:35,812 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 09:23:43,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4748180.0, ans=0.0 2024-08-20 09:23:46,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4748180.0, ans=0.2 2024-08-20 09:23:52,931 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 650, loss[loss=0.1095, beats_loss=0.009602, ecapa_loss=0.0001585, whisper_loss=0.09828, over 15032.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0102, ecapa_loss=0.00014, whisper_loss=0.08873, over 3574734.36 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:23:58,627 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 13 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 09:24:00,288 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 09:24:02,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4748280.0, ans=0.1 2024-08-20 09:24:17,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.327e+01 2.614e+01 2.843e+01 3.937e+01, threshold=5.228e+01, percent-clipped=0.0 2024-08-20 09:24:23,956 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 09:24:49,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4748580.0, ans=0.125 2024-08-20 09:25:12,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4748680.0, ans=0.0 2024-08-20 09:25:21,275 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 700, loss[loss=0.1058, beats_loss=0.01232, ecapa_loss=0.0001441, whisper_loss=0.09205, over 21563.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01022, ecapa_loss=0.000141, whisper_loss=0.08804, over 3613557.31 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:25:21,424 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 09:25:33,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4748780.0, ans=0.125 2024-08-20 09:25:39,494 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 09:26:00,329 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.291e+05 2024-08-20 09:26:08,722 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 09:26:32,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4749180.0, ans=0.125 2024-08-20 09:26:48,802 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 750, loss[loss=0.06937, beats_loss=0.0129, ecapa_loss=9.421e-05, whisper_loss=0.05553, over 15331.00 frames. ], tot_loss[loss=0.09935, beats_loss=0.01033, ecapa_loss=0.0001395, whisper_loss=0.08763, over 3659935.05 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:26:56,464 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 33 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 09:27:13,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.302e+01 2.530e+01 2.816e+01 3.828e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 09:27:29,932 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:27:36,498 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4749480.0, ans=0.125 2024-08-20 09:27:37,405 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 09:27:39,564 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 09:27:54,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4749580.0, ans=0.125 2024-08-20 09:28:00,201 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-20 09:28:03,306 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 09:28:18,117 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 800, loss[loss=0.08641, beats_loss=0.01044, ecapa_loss=0.0001126, whisper_loss=0.07484, over 17460.00 frames. ], tot_loss[loss=0.099, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.08727, over 3667825.94 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:28:23,940 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 12 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 09:28:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:54,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4749980.0, ans=0.035 2024-08-20 09:29:10,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4750080.0, ans=0.125 2024-08-20 09:29:15,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4750080.0, ans=0.035 2024-08-20 09:29:22,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4750080.0, ans=0.0 2024-08-20 09:29:44,831 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 09:29:46,188 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 850, loss[loss=0.09048, beats_loss=0.01218, ecapa_loss=0.0001049, whisper_loss=0.07725, over 22972.00 frames. ], tot_loss[loss=0.09911, beats_loss=0.01034, ecapa_loss=0.0001391, whisper_loss=0.08738, over 3695878.77 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:29:50,621 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 09:29:51,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2024-08-20 09:30:07,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4750380.0, ans=0.0 2024-08-20 09:30:09,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4750380.0, ans=0.125 2024-08-20 09:30:09,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4750380.0, ans=0.2 2024-08-20 09:30:11,284 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.195e+01 2.440e+01 2.729e+01 3.750e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 09:30:15,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4750380.0, ans=0.1 2024-08-20 09:30:18,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4750380.0, ans=0.0 2024-08-20 09:30:49,979 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 09:31:15,782 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 900, loss[loss=0.1024, beats_loss=0.009074, ecapa_loss=0.0001852, whisper_loss=0.09148, over 19584.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01023, ecapa_loss=0.0001392, whisper_loss=0.08878, over 3725470.80 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:31:25,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4750780.0, ans=0.0 2024-08-20 09:31:53,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4750980.0, ans=0.0 2024-08-20 09:32:02,312 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 09:32:13,647 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-20 09:32:37,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4751180.0, ans=0.125 2024-08-20 09:32:43,812 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 950, loss[loss=0.09521, beats_loss=0.009623, ecapa_loss=0.0001344, whisper_loss=0.08425, over 17460.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01017, ecapa_loss=0.0001398, whisper_loss=0.08877, over 3721564.48 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:33:04,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4751380.0, ans=0.1 2024-08-20 09:33:08,812 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.210e+01 2.427e+01 2.730e+01 1.118e+02, threshold=4.854e+01, percent-clipped=2.0 2024-08-20 09:34:02,576 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 09:34:03,923 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 09:34:10,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-20 09:34:12,745 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1000, loss[loss=0.1183, beats_loss=0.007795, ecapa_loss=0.0001511, whisper_loss=0.109, over 18626.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0101, ecapa_loss=0.0001402, whisper_loss=0.08933, over 3711533.38 frames. ], batch size: 72, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:34:28,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4751780.0, ans=0.0 2024-08-20 09:34:42,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2024-08-20 09:35:11,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4752080.0, ans=0.1 2024-08-20 09:35:15,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4752080.0, ans=0.125 2024-08-20 09:35:17,992 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 09:35:23,786 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2024-08-20 09:35:35,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4752180.0, ans=0.1 2024-08-20 09:35:41,253 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 09:35:42,296 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1050, loss[loss=0.1063, beats_loss=0.01108, ecapa_loss=0.0001245, whisper_loss=0.09393, over 23559.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01014, ecapa_loss=0.0001391, whisper_loss=0.08923, over 3702071.75 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:36:01,818 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 09:36:08,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.242e+01 2.597e+01 2.833e+01 4.409e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-20 09:36:14,128 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 16 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 09:36:21,027 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 09:36:41,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4752580.0, ans=0.0 2024-08-20 09:36:53,207 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 09:37:02,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4752680.0, ans=0.0 2024-08-20 09:37:05,054 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 09:37:11,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4752780.0, ans=0.125 2024-08-20 09:37:12,225 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1100, loss[loss=0.09962, beats_loss=0.01043, ecapa_loss=0.0001377, whisper_loss=0.08781, over 16632.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01021, ecapa_loss=0.0001381, whisper_loss=0.0895, over 3735971.46 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:37:19,448 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 09:37:23,209 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 09:37:54,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4752980.0, ans=0.2 2024-08-20 09:37:57,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4752980.0, ans=0.0 2024-08-20 09:38:08,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4753080.0, ans=0.125 2024-08-20 09:38:08,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4753080.0, ans=0.0 2024-08-20 09:38:20,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4753080.0, ans=0.125 2024-08-20 09:38:34,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4753180.0, ans=0.125 2024-08-20 09:38:37,328 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 09:38:42,095 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1150, loss[loss=0.06947, beats_loss=0.009411, ecapa_loss=0.0001366, whisper_loss=0.05869, over 16540.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01024, ecapa_loss=0.000138, whisper_loss=0.08926, over 3740337.75 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:38:48,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4753280.0, ans=0.125 2024-08-20 09:39:05,438 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 38 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 09:39:06,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.346e+01 2.635e+01 2.990e+01 2.498e+02, threshold=5.271e+01, percent-clipped=4.0 2024-08-20 09:39:10,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4753380.0, ans=0.125 2024-08-20 09:39:50,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4753580.0, ans=0.125 2024-08-20 09:40:03,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.07 vs. limit=22.5 2024-08-20 09:40:05,574 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.46 vs. limit=10.0 2024-08-20 09:40:11,330 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1200, loss[loss=0.1233, beats_loss=0.01005, ecapa_loss=0.0001283, whisper_loss=0.112, over 18350.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001372, whisper_loss=0.0894, over 3692987.78 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:40:13,660 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=4753780.0, ans=12.0 2024-08-20 09:40:19,005 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-20 09:40:27,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4753880.0, ans=0.0 2024-08-20 09:40:39,863 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 09:40:59,217 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 09:41:11,405 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-20 09:41:39,236 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1250, loss[loss=0.09984, beats_loss=0.0111, ecapa_loss=0.0001218, whisper_loss=0.08752, over 23574.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.08952, over 3729007.88 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:41:40,698 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 09:42:05,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.750e+01 2.979e+01 6.876e+01, threshold=5.500e+01, percent-clipped=2.0 2024-08-20 09:42:13,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4754480.0, ans=0.125 2024-08-20 09:42:35,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4754580.0, ans=0.125 2024-08-20 09:42:42,790 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 37 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 09:42:59,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4754680.0, ans=0.2 2024-08-20 09:43:07,363 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1300, loss[loss=0.09295, beats_loss=0.01064, ecapa_loss=0.0001626, whisper_loss=0.08068, over 18932.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.08924, over 3759323.45 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:43:13,262 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 09:43:13,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4754780.0, ans=0.0 2024-08-20 09:43:17,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4754780.0, ans=0.125 2024-08-20 09:43:51,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-20 09:43:55,394 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2024-08-20 09:44:22,432 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 09:44:32,607 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 19 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-20 09:44:37,941 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1350, loss[loss=0.1018, beats_loss=0.00943, ecapa_loss=0.0001415, whisper_loss=0.09099, over 22738.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001377, whisper_loss=0.0889, over 3768702.43 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:44:42,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4755280.0, ans=0.125 2024-08-20 09:44:56,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4755380.0, ans=0.0 2024-08-20 09:45:01,366 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 09:45:04,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.108e+01 2.418e+01 2.623e+01 3.290e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-20 09:45:09,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-08-20 09:45:15,573 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 09:45:18,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4755480.0, ans=0.1 2024-08-20 09:45:22,837 WARNING [optim.py:496] (2/4) Scaling gradients by 0.032859351485967636, model_norm_threshold=48.36314392089844 2024-08-20 09:45:23,002 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.250e+05, grad_sumsq=3.697e+04, orig_rms_sq=8.792e+00 2024-08-20 09:45:23,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4755480.0, ans=0.0 2024-08-20 09:45:27,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4755480.0, ans=0.0 2024-08-20 09:45:37,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4755580.0, ans=0.125 2024-08-20 09:45:37,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-20 09:45:37,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-08-20 09:45:55,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4755680.0, ans=0.125 2024-08-20 09:46:06,428 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 09:46:08,151 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1400, loss[loss=0.0847, beats_loss=0.01052, ecapa_loss=0.000134, whisper_loss=0.07284, over 14480.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001369, whisper_loss=0.08923, over 3796051.13 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:46:08,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4755780.0, ans=0.125 2024-08-20 09:46:24,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-08-20 09:46:25,749 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 09:46:50,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4755980.0, ans=0.125 2024-08-20 09:46:55,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4755980.0, ans=0.125 2024-08-20 09:46:57,559 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2024-08-20 09:47:15,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4756080.0, ans=0.1 2024-08-20 09:47:35,286 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1450, loss[loss=0.07806, beats_loss=0.01403, ecapa_loss=0.0001031, whisper_loss=0.063, over 15089.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001368, whisper_loss=0.08939, over 3782289.67 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:47:44,375 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 09:47:50,521 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 09:48:00,273 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4756380.0, ans=0.0 2024-08-20 09:48:01,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.208e+01 2.529e+01 2.783e+01 1.472e+03, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 09:48:18,733 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 15 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 09:48:21,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4756480.0, ans=0.0 2024-08-20 09:48:25,417 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-20 09:49:08,183 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 09:49:08,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-08-20 09:49:15,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4756680.0, ans=0.125 2024-08-20 09:49:24,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4756680.0, ans=0.2 2024-08-20 09:49:30,533 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1500, loss[loss=0.1107, beats_loss=0.01066, ecapa_loss=0.0001169, whisper_loss=0.09882, over 22627.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001361, whisper_loss=0.08982, over 3786807.85 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:49:30,691 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 09:49:33,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4756780.0, ans=0.09899494936611666 2024-08-20 09:49:44,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4756780.0, ans=0.0 2024-08-20 09:49:56,384 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 09:50:16,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4756980.0, ans=0.0 2024-08-20 09:50:24,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4757080.0, ans=0.0 2024-08-20 09:50:43,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4757180.0, ans=0.07 2024-08-20 09:51:03,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1550, loss[loss=0.09405, beats_loss=0.0106, ecapa_loss=0.0001246, whisper_loss=0.0822, over 23490.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01045, ecapa_loss=0.0001358, whisper_loss=0.08827, over 3773502.70 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:51:06,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-20 09:51:07,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4757280.0, ans=0.125 2024-08-20 09:51:27,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4757380.0, ans=0.0 2024-08-20 09:51:30,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.231e+01 2.477e+01 2.793e+01 4.044e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 09:51:33,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4757380.0, ans=0.0 2024-08-20 09:51:52,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2024-08-20 09:51:57,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4757580.0, ans=0.2 2024-08-20 09:51:57,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-08-20 09:52:09,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-20 09:52:14,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2024-08-20 09:52:35,564 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1600, loss[loss=0.09026, beats_loss=0.01135, ecapa_loss=0.0001262, whisper_loss=0.07765, over 21355.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01037, ecapa_loss=0.0001361, whisper_loss=0.08872, over 3774778.10 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:52:45,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4757780.0, ans=0.125 2024-08-20 09:52:45,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4757780.0, ans=0.0 2024-08-20 09:52:46,625 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 09:53:00,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-20 09:53:09,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4757880.0, ans=0.125 2024-08-20 09:53:09,529 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-08-20 09:53:26,606 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 15 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 09:53:38,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4758080.0, ans=0.125 2024-08-20 09:53:43,541 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 09:53:44,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4758080.0, ans=0.125 2024-08-20 09:54:06,365 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1650, loss[loss=0.1048, beats_loss=0.008667, ecapa_loss=0.0001366, whisper_loss=0.09473, over 22527.00 frames. ], tot_loss[loss=0.0999, beats_loss=0.01034, ecapa_loss=0.0001378, whisper_loss=0.08818, over 3757609.28 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:54:11,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-20 09:54:16,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4758280.0, ans=0.09899494936611666 2024-08-20 09:54:32,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.203e+01 2.449e+01 2.785e+01 3.857e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 09:54:36,459 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 09:54:46,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-20 09:54:57,699 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 09:55:26,870 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 09:55:32,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4758680.0, ans=0.1 2024-08-20 09:55:35,278 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1700, loss[loss=0.09915, beats_loss=0.008555, ecapa_loss=0.0001148, whisper_loss=0.08945, over 14429.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01035, ecapa_loss=0.0001382, whisper_loss=0.08835, over 3782950.92 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:55:37,397 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 19 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 09:55:42,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4758780.0, ans=0.0 2024-08-20 09:56:30,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4759080.0, ans=0.0 2024-08-20 09:56:52,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4759180.0, ans=0.0 2024-08-20 09:57:06,565 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1750, loss[loss=0.1016, beats_loss=0.01029, ecapa_loss=0.0001258, whisper_loss=0.09006, over 14414.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01023, ecapa_loss=0.0001393, whisper_loss=0.08907, over 3766670.04 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:57:23,331 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:57:33,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.295e+01 2.510e+01 2.716e+01 9.441e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 09:57:53,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4759480.0, ans=0.04949747468305833 2024-08-20 09:57:54,257 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 09:58:28,885 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 09:58:34,333 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1800, loss[loss=0.1164, beats_loss=0.008964, ecapa_loss=0.000169, whisper_loss=0.1058, over 19603.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01024, ecapa_loss=0.0001384, whisper_loss=0.08947, over 3775670.66 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:58:56,267 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 09:59:03,339 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 10:00:01,332 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1850, loss[loss=0.126, beats_loss=0.007935, ecapa_loss=0.0001399, whisper_loss=0.1167, over 22406.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01022, ecapa_loss=0.000138, whisper_loss=0.08926, over 3758035.32 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:00:22,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4760380.0, ans=0.125 2024-08-20 10:00:27,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.285e+01 2.493e+01 2.881e+01 4.103e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-20 10:00:31,394 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4760380.0, ans=0.125 2024-08-20 10:00:49,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-08-20 10:00:53,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4760580.0, ans=0.125 2024-08-20 10:01:12,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4760680.0, ans=0.125 2024-08-20 10:01:14,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4760680.0, ans=0.1 2024-08-20 10:01:28,634 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1900, loss[loss=0.1088, beats_loss=0.009565, ecapa_loss=0.0001199, whisper_loss=0.09806, over 19598.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.0001368, whisper_loss=0.08916, over 3768035.55 frames. ], batch size: 72, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:01:38,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4760780.0, ans=0.0 2024-08-20 10:01:42,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4760780.0, ans=0.09899494936611666 2024-08-20 10:01:49,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4760880.0, ans=0.125 2024-08-20 10:02:13,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4760980.0, ans=0.125 2024-08-20 10:02:24,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4761080.0, ans=0.0 2024-08-20 10:02:38,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-20 10:02:38,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-08-20 10:02:41,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-20 10:02:48,346 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 18 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 10:02:54,994 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 1950, loss[loss=0.13, beats_loss=0.008157, ecapa_loss=0.0001272, whisper_loss=0.1205, over 15430.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01031, ecapa_loss=0.0001369, whisper_loss=0.08879, over 3744571.36 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:03:19,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.238e+01 2.475e+01 2.855e+01 5.978e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-20 10:03:36,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4761480.0, ans=0.0 2024-08-20 10:03:37,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4761480.0, ans=0.125 2024-08-20 10:03:42,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4761480.0, ans=0.1 2024-08-20 10:03:42,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4761480.0, ans=0.125 2024-08-20 10:03:55,180 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4761580.0, ans=0.2 2024-08-20 10:03:57,910 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 10:04:00,520 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.586e+00 2024-08-20 10:04:02,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4761680.0, ans=0.025 2024-08-20 10:04:17,726 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 10:04:20,883 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2000, loss[loss=0.08646, beats_loss=0.00914, ecapa_loss=0.000123, whisper_loss=0.07609, over 15641.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001356, whisper_loss=0.08848, over 3728979.37 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:04:33,106 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-20 10:04:37,204 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 10:04:42,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4761880.0, ans=0.125 2024-08-20 10:05:32,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2024-08-20 10:05:37,374 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 10:05:46,733 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2050, loss[loss=0.1005, beats_loss=0.009398, ecapa_loss=0.0001275, whisper_loss=0.08979, over 14181.00 frames. ], tot_loss[loss=0.09951, beats_loss=0.01048, ecapa_loss=0.0001354, whisper_loss=0.08768, over 3699726.94 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:06:13,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.228e+01 2.469e+01 2.687e+01 4.353e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 10:06:19,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4762380.0, ans=0.125 2024-08-20 10:06:34,124 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 10:06:48,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4762580.0, ans=0.0 2024-08-20 10:06:51,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4762580.0, ans=0.125 2024-08-20 10:06:58,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4762680.0, ans=0.015 2024-08-20 10:07:12,938 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2100, loss[loss=0.09799, beats_loss=0.0101, ecapa_loss=0.0001456, whisper_loss=0.08643, over 19354.00 frames. ], tot_loss[loss=0.09943, beats_loss=0.01051, ecapa_loss=0.0001343, whisper_loss=0.08757, over 3675434.54 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:07:27,873 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-08-20 10:07:45,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4762880.0, ans=0.125 2024-08-20 10:07:59,257 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-08-20 10:08:03,255 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 10:08:19,689 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 10:08:38,806 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2150, loss[loss=0.0919, beats_loss=0.007431, ecapa_loss=0.0001924, whisper_loss=0.08254, over 14901.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0104, ecapa_loss=0.0001343, whisper_loss=0.08844, over 3661541.20 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:08:40,211 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-20 10:08:45,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4763280.0, ans=0.05 2024-08-20 10:09:05,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.500e+01 2.856e+01 5.859e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-20 10:09:35,787 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 10:09:40,638 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 10:09:44,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4763580.0, ans=0.0 2024-08-20 10:09:44,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4763580.0, ans=0.1 2024-08-20 10:09:51,143 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 10:09:53,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4763680.0, ans=0.0 2024-08-20 10:10:02,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4763680.0, ans=0.2 2024-08-20 10:10:05,300 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2200, loss[loss=0.1069, beats_loss=0.009862, ecapa_loss=0.0001448, whisper_loss=0.09557, over 16437.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01044, ecapa_loss=0.0001346, whisper_loss=0.088, over 3681766.84 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:10:22,491 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 10:10:25,959 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 10:10:38,918 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 10:11:03,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4764080.0, ans=0.0 2024-08-20 10:11:03,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4764080.0, ans=0.1 2024-08-20 10:11:25,963 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-20 10:11:27,010 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 15 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 10:11:30,008 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2250, loss[loss=0.1228, beats_loss=0.01043, ecapa_loss=0.0001434, whisper_loss=0.1109, over 23585.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01038, ecapa_loss=0.0001348, whisper_loss=0.08876, over 3675349.21 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:11:55,372 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.206e+01 2.415e+01 2.754e+01 4.736e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-20 10:12:01,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4764380.0, ans=0.5 2024-08-20 10:12:09,982 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-20 10:12:11,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4764480.0, ans=0.1 2024-08-20 10:12:12,170 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-08-20 10:12:21,597 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 11 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-20 10:12:30,029 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 30 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 10:12:36,758 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 10:12:55,546 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2300, loss[loss=0.09816, beats_loss=0.01187, ecapa_loss=0.0001223, whisper_loss=0.08506, over 20932.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01047, ecapa_loss=0.0001364, whisper_loss=0.08868, over 3706329.09 frames. ], batch size: 85, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:13:06,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4764780.0, ans=0.2 2024-08-20 10:13:06,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4764780.0, ans=0.0 2024-08-20 10:13:14,309 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-20 10:13:31,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4764980.0, ans=0.0 2024-08-20 10:13:38,450 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 10:14:21,861 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2350, loss[loss=0.1025, beats_loss=0.008588, ecapa_loss=0.0002216, whisper_loss=0.09172, over 12395.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001375, whisper_loss=0.08956, over 3742573.12 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:14:25,316 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 10:14:33,048 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 10:14:48,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.264e+01 2.559e+01 2.893e+01 5.027e+01, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 10:14:53,081 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 10:15:16,216 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 10:15:25,564 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 23 from LS+wenet, 16 from Vox, 11 fro AS 2024-08-20 10:15:27,514 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 15 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 10:15:27,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4765580.0, ans=0.0 2024-08-20 10:15:40,624 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 10:15:40,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4765680.0, ans=0.125 2024-08-20 10:15:44,148 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4765680.0, ans=0.04949747468305833 2024-08-20 10:15:46,592 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2400, loss[loss=0.1098, beats_loss=0.01083, ecapa_loss=0.0001183, whisper_loss=0.09776, over 23151.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001394, whisper_loss=0.09008, over 3692254.64 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:16:02,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4765880.0, ans=0.025 2024-08-20 10:16:04,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4765880.0, ans=0.0 2024-08-20 10:16:05,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4765880.0, ans=0.125 2024-08-20 10:16:25,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4765980.0, ans=0.125 2024-08-20 10:16:25,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4765980.0, ans=0.025 2024-08-20 10:16:39,785 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 10:16:45,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4766080.0, ans=0.125 2024-08-20 10:16:53,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4766180.0, ans=0.0 2024-08-20 10:16:55,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4766180.0, ans=0.125 2024-08-20 10:17:02,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4766180.0, ans=0.1 2024-08-20 10:17:05,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4766180.0, ans=0.0 2024-08-20 10:17:11,565 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2450, loss[loss=0.1019, beats_loss=0.01068, ecapa_loss=0.0001635, whisper_loss=0.08959, over 22872.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001389, whisper_loss=0.08976, over 3704262.46 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:17:21,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4766280.0, ans=0.04949747468305833 2024-08-20 10:17:35,211 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4766380.0, ans=0.2 2024-08-20 10:17:38,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.311e+01 2.583e+01 2.758e+01 5.133e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-20 10:17:41,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4766380.0, ans=0.0 2024-08-20 10:18:18,061 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:18:18,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4766580.0, ans=0.125 2024-08-20 10:18:28,327 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-20 10:18:31,970 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 10:18:40,995 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2500, loss[loss=0.1116, beats_loss=0.01062, ecapa_loss=0.000147, whisper_loss=0.09948, over 18673.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001375, whisper_loss=0.09035, over 3723005.54 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:18:52,328 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:19:04,949 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 10:19:13,981 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 10:19:19,719 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 10:19:23,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4766980.0, ans=0.035 2024-08-20 10:19:39,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4767080.0, ans=0.125 2024-08-20 10:19:51,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4767080.0, ans=0.2 2024-08-20 10:20:02,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4767180.0, ans=0.125 2024-08-20 10:20:10,646 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 10:20:11,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4767280.0, ans=0.2 2024-08-20 10:20:12,090 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2550, loss[loss=0.1037, beats_loss=0.01026, ecapa_loss=0.0001315, whisper_loss=0.0921, over 19062.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001366, whisper_loss=0.08989, over 3712675.74 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:20:35,318 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 10:20:37,210 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 20 from LS+wenet, 35 from Vox, 39 fro AS 2024-08-20 10:20:38,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-08-20 10:20:39,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.312e+01 2.481e+01 2.687e+01 3.912e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 10:20:39,769 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 10:20:50,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4767480.0, ans=0.0 2024-08-20 10:21:03,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4767480.0, ans=0.2 2024-08-20 10:21:35,008 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 10:21:42,276 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2600, loss[loss=0.09868, beats_loss=0.01286, ecapa_loss=0.0001286, whisper_loss=0.08454, over 23149.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001377, whisper_loss=0.08987, over 3753423.23 frames. ], batch size: 95, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:21:51,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4767780.0, ans=0.2 2024-08-20 10:21:54,822 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 10:21:56,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4767780.0, ans=0.09899494936611666 2024-08-20 10:22:01,609 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 23 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-20 10:22:05,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4767880.0, ans=0.0 2024-08-20 10:22:10,839 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 10:22:14,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4767880.0, ans=0.125 2024-08-20 10:22:44,036 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 10:23:02,456 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4768180.0, ans=0.0 2024-08-20 10:23:10,964 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2650, loss[loss=0.1047, beats_loss=0.01051, ecapa_loss=0.000132, whisper_loss=0.0929, over 18687.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.08949, over 3726441.24 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:23:38,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.210e+01 2.428e+01 2.721e+01 4.084e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 10:23:49,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4768480.0, ans=0.09899494936611666 2024-08-20 10:23:53,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4768480.0, ans=0.1 2024-08-20 10:24:01,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4768480.0, ans=0.125 2024-08-20 10:24:09,520 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 10:24:10,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-20 10:24:20,277 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 10:24:40,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-20 10:24:41,439 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2700, loss[loss=0.102, beats_loss=0.01016, ecapa_loss=0.0001318, whisper_loss=0.09048, over 22809.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01027, ecapa_loss=0.0001386, whisper_loss=0.08914, over 3738819.85 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:25:17,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2024-08-20 10:25:23,823 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 10:25:27,625 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 10:25:46,439 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 10:26:12,618 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2750, loss[loss=0.08803, beats_loss=0.01215, ecapa_loss=0.0001323, whisper_loss=0.07456, over 22267.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01026, ecapa_loss=0.0001376, whisper_loss=0.08907, over 3779999.49 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:26:19,704 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 10:26:38,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.290e+01 2.547e+01 2.861e+01 3.965e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-20 10:26:54,032 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 10:27:02,316 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:27:17,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-08-20 10:27:24,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4769680.0, ans=0.0 2024-08-20 10:27:40,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4769780.0, ans=0.125 2024-08-20 10:27:41,795 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2800, loss[loss=0.1076, beats_loss=0.01041, ecapa_loss=0.0001483, whisper_loss=0.09574, over 17867.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001372, whisper_loss=0.08897, over 3782980.34 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:28:06,533 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 10:28:11,821 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 10:28:14,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4769880.0, ans=0.1 2024-08-20 10:28:32,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4769980.0, ans=0.125 2024-08-20 10:28:32,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4769980.0, ans=0.125 2024-08-20 10:28:37,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4770080.0, ans=0.1 2024-08-20 10:28:41,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4770080.0, ans=0.0 2024-08-20 10:28:54,579 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 10:29:05,491 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 10:29:10,467 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2850, loss[loss=0.1124, beats_loss=0.009108, ecapa_loss=0.0001119, whisper_loss=0.1022, over 18262.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01034, ecapa_loss=0.0001371, whisper_loss=0.08902, over 3746386.46 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:29:23,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-08-20 10:29:25,663 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2024-08-20 10:29:27,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4770380.0, ans=0.125 2024-08-20 10:29:32,610 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 10:29:37,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.626e+01 2.251e+01 2.450e+01 2.765e+01 5.044e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-20 10:30:01,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4770480.0, ans=0.1 2024-08-20 10:30:02,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4770580.0, ans=0.125 2024-08-20 10:30:09,359 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 10:30:10,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4770580.0, ans=0.0 2024-08-20 10:30:11,113 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 10:30:20,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4770680.0, ans=0.2 2024-08-20 10:30:34,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4770680.0, ans=0.1 2024-08-20 10:30:35,274 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 10:30:38,578 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2900, loss[loss=0.1019, beats_loss=0.008357, ecapa_loss=0.0001576, whisper_loss=0.09196, over 19982.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001369, whisper_loss=0.08949, over 3782937.70 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:31:03,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4770880.0, ans=0.125 2024-08-20 10:31:12,736 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 10:31:23,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.79 vs. limit=22.5 2024-08-20 10:31:27,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4770980.0, ans=0.1 2024-08-20 10:31:38,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-20 10:31:51,381 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-20 10:32:08,262 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 2950, loss[loss=0.09114, beats_loss=0.009989, ecapa_loss=0.0001485, whisper_loss=0.07966, over 21132.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01038, ecapa_loss=0.0001384, whisper_loss=0.08896, over 3797537.69 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:32:12,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4771280.0, ans=0.125 2024-08-20 10:32:14,763 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-08-20 10:32:16,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4771280.0, ans=0.125 2024-08-20 10:32:35,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.380e+01 2.528e+01 2.893e+01 7.268e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-20 10:32:47,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4771480.0, ans=0.0 2024-08-20 10:32:52,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4771480.0, ans=0.125 2024-08-20 10:33:02,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-20 10:33:37,799 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3000, loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001353, whisper_loss=0.08952, over 22688.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08895, over 3790369.28 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:33:37,799 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 10:34:13,808 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on ASR_libri: loss=0.2557, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2506, over 931116.00 frames. 2024-08-20 10:34:36,575 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on SV_voxceleb1: loss=0.003928, beats_loss=0, ecapa_loss=0.0003928, whisper_loss=0, over 944235.00 frames. 2024-08-20 10:36:13,070 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 10:36:13,075 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 10:36:39,560 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:36:40,529 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 10:36:41,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4771880.0, ans=0.0 2024-08-20 10:37:09,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4772080.0, ans=0.1 2024-08-20 10:37:27,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4772180.0, ans=0.0 2024-08-20 10:37:30,409 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-20 10:37:33,585 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3050, loss[loss=0.1076, beats_loss=0.008205, ecapa_loss=0.0001389, whisper_loss=0.09805, over 13577.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08962, over 3786850.08 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:37:43,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4772280.0, ans=0.125 2024-08-20 10:37:58,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.325e+01 2.539e+01 2.982e+01 4.388e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-20 10:38:00,867 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 34 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 10:38:11,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4772480.0, ans=0.125 2024-08-20 10:38:13,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4772480.0, ans=0.125 2024-08-20 10:38:19,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-20 10:38:38,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4772680.0, ans=0.09899494936611666 2024-08-20 10:38:39,923 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 20 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 10:38:40,385 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.294e+01 2024-08-20 10:38:41,244 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 10:38:47,400 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 10:38:50,902 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-20 10:38:55,599 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3100, loss[loss=0.09944, beats_loss=0.01206, ecapa_loss=0.0001101, whisper_loss=0.08628, over 22202.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01035, ecapa_loss=0.0001391, whisper_loss=0.08958, over 3769025.63 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:39:14,091 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 10:39:25,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4772880.0, ans=0.0 2024-08-20 10:39:34,390 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2024-08-20 10:39:53,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4773080.0, ans=0.0 2024-08-20 10:40:17,700 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3150, loss[loss=0.09971, beats_loss=0.008696, ecapa_loss=0.0001512, whisper_loss=0.08951, over 14894.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08963, over 3809343.03 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:40:31,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-20 10:40:42,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.206e+01 2.480e+01 2.972e+01 5.332e+01, threshold=4.960e+01, percent-clipped=1.0 2024-08-20 10:40:49,807 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2024-08-20 10:41:03,265 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 10:41:12,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4773580.0, ans=0.125 2024-08-20 10:41:23,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-08-20 10:41:27,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4773680.0, ans=0.2 2024-08-20 10:41:33,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4773680.0, ans=0.1 2024-08-20 10:41:34,162 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-08-20 10:41:38,129 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3200, loss[loss=0.1243, beats_loss=0.009696, ecapa_loss=0.0001706, whisper_loss=0.1129, over 22451.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001406, whisper_loss=0.08988, over 3788172.31 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:41:52,794 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 10:42:02,715 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 10:42:21,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4773980.0, ans=0.0 2024-08-20 10:42:59,580 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3250, loss[loss=0.1237, beats_loss=0.009984, ecapa_loss=0.0001158, whisper_loss=0.1125, over 24469.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09004, over 3770459.16 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:43:02,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4774280.0, ans=0.125 2024-08-20 10:43:04,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4774280.0, ans=0.125 2024-08-20 10:43:12,198 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 10:43:16,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4774380.0, ans=0.125 2024-08-20 10:43:25,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.267e+01 2.440e+01 2.710e+01 3.634e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 10:43:28,089 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 10:43:50,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4774580.0, ans=0.125 2024-08-20 10:43:53,634 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 10:43:57,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4774580.0, ans=0.95 2024-08-20 10:44:03,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4774580.0, ans=0.2 2024-08-20 10:44:12,489 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 10:44:25,477 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3300, loss[loss=0.1056, beats_loss=0.01074, ecapa_loss=0.0001609, whisper_loss=0.09329, over 19934.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08975, over 3780368.75 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:44:28,109 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.605e+01 2024-08-20 10:44:29,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4774780.0, ans=0.125 2024-08-20 10:44:33,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4774780.0, ans=0.1 2024-08-20 10:44:53,745 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 10:44:53,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4774880.0, ans=0.1 2024-08-20 10:45:29,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-20 10:45:30,594 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 10:45:38,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4775180.0, ans=0.125 2024-08-20 10:45:50,682 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3350, loss[loss=0.1002, beats_loss=0.00955, ecapa_loss=0.0001761, whisper_loss=0.08889, over 13028.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09055, over 3788614.06 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:45:50,881 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-20 10:46:06,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4775380.0, ans=0.125 2024-08-20 10:46:08,773 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 10:46:14,984 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.82 vs. limit=15.0 2024-08-20 10:46:16,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.283e+01 2.507e+01 2.665e+01 5.653e+01, threshold=5.015e+01, percent-clipped=1.0 2024-08-20 10:46:17,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4775380.0, ans=0.1 2024-08-20 10:46:20,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4775380.0, ans=0.125 2024-08-20 10:46:25,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4775480.0, ans=0.1 2024-08-20 10:46:35,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4775480.0, ans=0.125 2024-08-20 10:47:00,392 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-20 10:47:03,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4775680.0, ans=0.125 2024-08-20 10:47:05,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4775680.0, ans=0.125 2024-08-20 10:47:13,171 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3400, loss[loss=0.1385, beats_loss=0.007891, ecapa_loss=0.0001544, whisper_loss=0.1291, over 25141.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.09083, over 3818763.15 frames. ], batch size: 95, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:47:13,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4775780.0, ans=0.0 2024-08-20 10:47:33,854 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4775880.0, ans=0.125 2024-08-20 10:47:50,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4775980.0, ans=0.0 2024-08-20 10:47:51,942 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 10:47:56,911 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 18 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 10:47:59,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4775980.0, ans=0.0 2024-08-20 10:48:00,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4775980.0, ans=0.125 2024-08-20 10:48:36,715 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3450, loss[loss=0.09204, beats_loss=0.01264, ecapa_loss=0.0001215, whisper_loss=0.07818, over 23619.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001415, whisper_loss=0.09039, over 3813224.82 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:48:51,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4776380.0, ans=0.0 2024-08-20 10:49:02,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.189e+01 2.533e+01 2.792e+01 4.546e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-20 10:49:18,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4776480.0, ans=0.125 2024-08-20 10:49:30,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4776580.0, ans=0.0 2024-08-20 10:49:31,980 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 13 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 10:49:35,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4776580.0, ans=0.1 2024-08-20 10:49:43,565 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 10:49:48,373 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 10:49:48,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4776680.0, ans=0.0 2024-08-20 10:49:58,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4776780.0, ans=0.125 2024-08-20 10:49:59,514 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3500, loss[loss=0.06947, beats_loss=0.01498, ecapa_loss=0.0001047, whisper_loss=0.05344, over 22748.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.08944, over 3830689.83 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:50:01,489 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 19 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 10:50:20,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4776880.0, ans=0.1 2024-08-20 10:50:21,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2024-08-20 10:50:23,918 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 10:51:04,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4777080.0, ans=0.0 2024-08-20 10:51:06,502 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2024-08-20 10:51:08,854 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 10:51:10,678 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 10:51:20,660 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 10:51:25,487 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3550, loss[loss=0.1056, beats_loss=0.01016, ecapa_loss=0.0001348, whisper_loss=0.09412, over 21132.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08933, over 3812991.35 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:51:26,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4777280.0, ans=0.1 2024-08-20 10:51:46,559 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 10:51:47,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4777380.0, ans=0.1 2024-08-20 10:51:53,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.256e+01 2.446e+01 2.719e+01 3.472e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 10:51:56,398 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-20 10:51:59,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4777480.0, ans=0.125 2024-08-20 10:52:03,582 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 10:52:05,414 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 10:52:10,778 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 15 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 10:52:30,585 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-08-20 10:52:52,781 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3600, loss[loss=0.1174, beats_loss=0.008842, ecapa_loss=0.0001304, whisper_loss=0.1073, over 13735.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.08944, over 3791931.33 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:52:53,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4777780.0, ans=0.035 2024-08-20 10:52:57,356 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 14 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 10:52:59,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4777780.0, ans=0.125 2024-08-20 10:53:23,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4777880.0, ans=0.05 2024-08-20 10:53:28,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4777880.0, ans=0.0 2024-08-20 10:53:28,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4777880.0, ans=0.125 2024-08-20 10:53:45,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4777980.0, ans=0.015 2024-08-20 10:53:58,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4777980.0, ans=0.0 2024-08-20 10:54:02,875 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 10:54:19,625 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 10:54:48,809 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 10:54:51,598 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3650, loss[loss=0.09798, beats_loss=0.01003, ecapa_loss=0.0001524, whisper_loss=0.08642, over 16837.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01033, ecapa_loss=0.000142, whisper_loss=0.0895, over 3782403.69 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:54:52,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4778280.0, ans=0.125 2024-08-20 10:55:33,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.274e+01 2.510e+01 2.811e+01 1.402e+02, threshold=5.019e+01, percent-clipped=2.0 2024-08-20 10:55:36,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4778380.0, ans=0.1 2024-08-20 10:55:42,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4778480.0, ans=0.0 2024-08-20 10:56:53,786 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3700, loss[loss=0.09156, beats_loss=0.01107, ecapa_loss=0.0001483, whisper_loss=0.07901, over 13668.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.08925, over 3763070.31 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:56:59,528 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.620e+05 2024-08-20 10:57:12,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4778780.0, ans=0.035 2024-08-20 10:57:16,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4778880.0, ans=0.125 2024-08-20 10:57:41,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4778980.0, ans=0.0 2024-08-20 10:57:58,780 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 10:58:27,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4779180.0, ans=0.125 2024-08-20 10:58:34,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4779180.0, ans=0.0 2024-08-20 10:58:44,828 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3750, loss[loss=0.07193, beats_loss=0.0129, ecapa_loss=0.0001043, whisper_loss=0.05799, over 13715.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08937, over 3777292.01 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:58:45,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4779280.0, ans=0.2 2024-08-20 10:59:18,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4779380.0, ans=0.1 2024-08-20 10:59:18,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4779380.0, ans=0.0 2024-08-20 10:59:25,381 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 32 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 10:59:28,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.202e+01 2.451e+01 2.797e+01 4.489e+01, threshold=4.903e+01, percent-clipped=0.0 2024-08-20 11:00:04,873 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 11:00:18,509 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 11:00:28,664 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:00:38,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4779680.0, ans=0.0 2024-08-20 11:00:45,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4779680.0, ans=0.0 2024-08-20 11:00:50,450 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3800, loss[loss=0.1284, beats_loss=0.009688, ecapa_loss=0.0001362, whisper_loss=0.1173, over 23365.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.08938, over 3793335.60 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:01:00,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4779780.0, ans=0.09899494936611666 2024-08-20 11:01:20,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-20 11:01:28,076 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:01:35,461 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 11:02:30,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4780180.0, ans=0.2 2024-08-20 11:02:57,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3850, loss[loss=0.1078, beats_loss=0.01071, ecapa_loss=0.0001348, whisper_loss=0.09575, over 22587.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001418, whisper_loss=0.08965, over 3770796.54 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:03:01,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4780280.0, ans=0.125 2024-08-20 11:03:32,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.347e+01 2.624e+01 2.897e+01 4.079e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 11:03:37,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4780480.0, ans=0.125 2024-08-20 11:03:39,620 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-20 11:03:47,115 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-08-20 11:03:54,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4780480.0, ans=0.1 2024-08-20 11:03:56,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=4780580.0, ans=0.02 2024-08-20 11:04:40,756 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3900, loss[loss=0.1044, beats_loss=0.01236, ecapa_loss=0.0001335, whisper_loss=0.09067, over 20653.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001419, whisper_loss=0.08943, over 3772903.29 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:04:43,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4780780.0, ans=0.07 2024-08-20 11:04:47,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4780780.0, ans=0.1 2024-08-20 11:04:49,071 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 11:05:03,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4780880.0, ans=0.125 2024-08-20 11:05:13,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4780880.0, ans=0.0 2024-08-20 11:05:27,769 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 11:05:49,430 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 20 from LS+wenet, 15 from Vox, 56 fro AS 2024-08-20 11:05:49,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4781080.0, ans=0.125 2024-08-20 11:06:01,028 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 11:06:03,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4781080.0, ans=0.1 2024-08-20 11:06:05,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4781180.0, ans=0.0 2024-08-20 11:06:07,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4781180.0, ans=0.125 2024-08-20 11:06:22,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4781180.0, ans=0.125 2024-08-20 11:06:24,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4781180.0, ans=0.2 2024-08-20 11:06:31,154 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 3950, loss[loss=0.0795, beats_loss=0.01419, ecapa_loss=0.000139, whisper_loss=0.06393, over 14410.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.08908, over 3775520.22 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:06:33,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4781280.0, ans=0.0 2024-08-20 11:06:37,908 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 41 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 11:06:39,864 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 11:07:02,758 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-08-20 11:07:08,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.311e+01 2.584e+01 2.900e+01 3.704e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-20 11:07:10,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4781380.0, ans=0.05 2024-08-20 11:07:16,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4781480.0, ans=0.2 2024-08-20 11:07:47,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4781580.0, ans=0.125 2024-08-20 11:08:00,288 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 11:08:00,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4781680.0, ans=0.0 2024-08-20 11:08:11,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4781680.0, ans=0.1 2024-08-20 11:08:15,348 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4000, loss[loss=0.08358, beats_loss=0.01373, ecapa_loss=0.0001104, whisper_loss=0.06874, over 22183.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.08983, over 3828472.23 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:08:44,727 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4781880.0, ans=0.1 2024-08-20 11:09:08,558 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:09:30,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4782080.0, ans=0.0 2024-08-20 11:09:45,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4782180.0, ans=0.0 2024-08-20 11:09:51,824 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-20 11:09:56,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-20 11:10:01,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4782180.0, ans=0.0 2024-08-20 11:10:01,354 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-20 11:10:03,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4782180.0, ans=0.0 2024-08-20 11:10:07,431 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4050, loss[loss=0.09905, beats_loss=0.00943, ecapa_loss=0.0001281, whisper_loss=0.08834, over 18108.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.09014, over 3837569.43 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:10:40,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4782380.0, ans=0.1 2024-08-20 11:10:50,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.323e+01 2.567e+01 2.848e+01 3.981e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-20 11:11:33,670 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 11:12:04,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4782680.0, ans=0.1 2024-08-20 11:12:09,989 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4100, loss[loss=0.08255, beats_loss=0.01111, ecapa_loss=0.0001292, whisper_loss=0.07015, over 19644.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001413, whisper_loss=0.08952, over 3834726.86 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:12:19,944 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 11:13:05,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4783080.0, ans=0.1 2024-08-20 11:13:07,598 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-20 11:13:25,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4783180.0, ans=0.125 2024-08-20 11:13:41,072 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4150, loss[loss=0.1023, beats_loss=0.01238, ecapa_loss=0.0001387, whisper_loss=0.08849, over 22142.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001409, whisper_loss=0.08972, over 3863103.35 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:14:01,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4783380.0, ans=0.125 2024-08-20 11:14:10,047 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 11 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 11:14:11,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.670e+01 3.158e+01 1.265e+02, threshold=5.340e+01, percent-clipped=2.0 2024-08-20 11:14:13,698 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 11:14:57,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-20 11:15:08,171 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4200, loss[loss=0.1003, beats_loss=0.01148, ecapa_loss=0.0001403, whisper_loss=0.0874, over 22793.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001408, whisper_loss=0.09028, over 3853338.51 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:15:09,587 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4783780.0, ans=0.125 2024-08-20 11:15:16,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4783780.0, ans=0.02 2024-08-20 11:15:16,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4783780.0, ans=0.1 2024-08-20 11:15:35,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4783880.0, ans=0.125 2024-08-20 11:15:38,711 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 11:15:58,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4783980.0, ans=0.125 2024-08-20 11:16:02,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4784080.0, ans=0.125 2024-08-20 11:16:21,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4784180.0, ans=0.125 2024-08-20 11:16:23,175 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 11:16:25,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4784180.0, ans=0.125 2024-08-20 11:16:37,537 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4250, loss[loss=0.08771, beats_loss=0.01104, ecapa_loss=0.0001314, whisper_loss=0.07536, over 16304.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.09034, over 3834527.70 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:16:49,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4784280.0, ans=0.125 2024-08-20 11:17:06,360 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 11:17:07,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.233e+01 2.477e+01 2.798e+01 4.198e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-20 11:17:13,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4784480.0, ans=0.125 2024-08-20 11:17:14,723 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 31 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 11:17:29,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4784580.0, ans=0.0 2024-08-20 11:17:39,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4784580.0, ans=0.2 2024-08-20 11:17:45,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4784580.0, ans=0.0 2024-08-20 11:17:59,453 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4784680.0, ans=0.0 2024-08-20 11:18:05,739 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4300, loss[loss=0.1023, beats_loss=0.01128, ecapa_loss=0.0001224, whisper_loss=0.08975, over 17318.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001401, whisper_loss=0.09021, over 3802292.78 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:18:09,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4784780.0, ans=0.0 2024-08-20 11:18:23,328 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 11:18:25,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4784880.0, ans=0.125 2024-08-20 11:18:31,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4784880.0, ans=0.125 2024-08-20 11:18:44,807 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 11:18:46,342 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 11:18:49,811 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 11:19:18,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-20 11:19:23,162 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 23 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 11:19:23,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4785180.0, ans=0.07 2024-08-20 11:19:30,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4785180.0, ans=0.0 2024-08-20 11:19:32,887 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4350, loss[loss=0.07767, beats_loss=0.01089, ecapa_loss=0.0001682, whisper_loss=0.06509, over 17696.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001402, whisper_loss=0.08984, over 3822243.34 frames. ], batch size: 78, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:19:40,343 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 11:19:53,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4785380.0, ans=0.0 2024-08-20 11:19:55,906 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 36 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 11:20:02,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.221e+01 2.524e+01 2.843e+01 5.218e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 11:20:03,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4785380.0, ans=0.125 2024-08-20 11:20:09,634 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 11:20:17,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4785480.0, ans=0.2 2024-08-20 11:20:34,842 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 11:20:42,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4785680.0, ans=0.125 2024-08-20 11:21:01,241 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4400, loss[loss=0.09155, beats_loss=0.01137, ecapa_loss=0.0001144, whisper_loss=0.07904, over 21985.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.09005, over 3818332.18 frames. ], batch size: 85, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:21:05,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4785780.0, ans=0.125 2024-08-20 11:21:06,002 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-20 11:21:21,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4785880.0, ans=0.0 2024-08-20 11:21:23,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4785880.0, ans=0.0 2024-08-20 11:21:23,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.62 vs. limit=10.0 2024-08-20 11:22:05,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4786080.0, ans=0.125 2024-08-20 11:22:12,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4786180.0, ans=0.1 2024-08-20 11:22:16,103 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.500e-01 2024-08-20 11:22:30,826 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4450, loss[loss=0.1077, beats_loss=0.009496, ecapa_loss=0.0001235, whisper_loss=0.09697, over 16499.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.08886, over 3788493.38 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:22:39,564 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 11:22:43,301 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 23 from LS+wenet, 11 from Vox, 17 fro AS 2024-08-20 11:22:48,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4786380.0, ans=0.2 2024-08-20 11:23:00,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.276e+01 2.505e+01 2.862e+01 6.840e+01, threshold=5.011e+01, percent-clipped=2.0 2024-08-20 11:23:20,843 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-20 11:23:24,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4786580.0, ans=0.0 2024-08-20 11:23:30,967 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4786580.0, ans=0.2 2024-08-20 11:23:39,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4786680.0, ans=0.1 2024-08-20 11:23:49,176 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 11:23:57,872 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4500, loss[loss=0.08471, beats_loss=0.01168, ecapa_loss=0.0001282, whisper_loss=0.07174, over 22958.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.08907, over 3798097.44 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:23:58,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4786780.0, ans=0.2 2024-08-20 11:24:10,743 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 11:24:12,442 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 11:24:14,036 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 11:24:19,388 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 11:24:41,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4786980.0, ans=0.2 2024-08-20 11:24:42,374 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06896068155765533, model_norm_threshold=50.106773376464844 2024-08-20 11:24:42,544 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.994e+04, grad_sumsq=4.994e+04, orig_rms_sq=1.000e+00 2024-08-20 11:24:45,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4786980.0, ans=0.125 2024-08-20 11:25:00,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4787080.0, ans=0.125 2024-08-20 11:25:05,631 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4787180.0, ans=0.125 2024-08-20 11:25:16,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4787180.0, ans=0.125 2024-08-20 11:25:16,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4787180.0, ans=0.2 2024-08-20 11:25:23,342 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 11:25:25,028 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4550, loss[loss=0.07827, beats_loss=0.01267, ecapa_loss=0.0001395, whisper_loss=0.06421, over 14486.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001389, whisper_loss=0.08885, over 3793579.19 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:25:30,647 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 16 from LS+wenet, 31 from Vox, 46 fro AS 2024-08-20 11:25:32,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4787280.0, ans=0.2 2024-08-20 11:25:36,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.91 vs. limit=10.0 2024-08-20 11:25:55,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4787380.0, ans=0.0 2024-08-20 11:25:56,150 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.635e+01 2.231e+01 2.496e+01 2.825e+01 7.266e+02, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 11:26:09,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4787480.0, ans=0.125 2024-08-20 11:26:31,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4787580.0, ans=0.2 2024-08-20 11:26:32,423 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 11:26:52,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4787680.0, ans=0.125 2024-08-20 11:26:55,705 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4600, loss[loss=0.11, beats_loss=0.01124, ecapa_loss=0.0001624, whisper_loss=0.09713, over 22136.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001388, whisper_loss=0.08914, over 3802066.30 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:27:03,038 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 11:27:24,151 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 11:27:26,253 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 11:27:30,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4787980.0, ans=0.0 2024-08-20 11:27:36,988 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2024-08-20 11:28:07,908 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 11:28:22,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4788180.0, ans=0.0 2024-08-20 11:28:26,279 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4650, loss[loss=0.07559, beats_loss=0.01187, ecapa_loss=0.0001466, whisper_loss=0.06226, over 15430.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08942, over 3822003.11 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:28:35,337 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 11:28:56,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+01 2.344e+01 2.573e+01 2.789e+01 3.818e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-20 11:28:59,778 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-20 11:29:19,180 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 11:29:19,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4788580.0, ans=0.125 2024-08-20 11:29:23,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4788580.0, ans=0.125 2024-08-20 11:29:37,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4788680.0, ans=0.2 2024-08-20 11:29:56,545 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4700, loss[loss=0.1073, beats_loss=0.01065, ecapa_loss=0.0001417, whisper_loss=0.09521, over 21953.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08879, over 3841741.99 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:30:06,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4788780.0, ans=0.125 2024-08-20 11:30:20,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=22.5 2024-08-20 11:30:24,446 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 11:30:43,313 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 11:31:01,577 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 11:31:08,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4789180.0, ans=0.125 2024-08-20 11:31:22,466 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4750, loss[loss=0.08811, beats_loss=0.01075, ecapa_loss=0.0001609, whisper_loss=0.07575, over 14078.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08901, over 3830129.35 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:31:52,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.256e+01 2.490e+01 2.747e+01 3.725e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 11:32:29,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4789580.0, ans=0.0 2024-08-20 11:32:31,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4789580.0, ans=0.125 2024-08-20 11:32:55,389 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4800, loss[loss=0.1136, beats_loss=0.009397, ecapa_loss=0.0001397, whisper_loss=0.1028, over 21820.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.09019, over 3836658.30 frames. ], batch size: 86, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:33:04,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=4789780.0, ans=15.0 2024-08-20 11:33:32,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2024-08-20 11:33:58,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4790080.0, ans=0.2 2024-08-20 11:34:03,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-20 11:34:10,053 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 11:34:13,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4790080.0, ans=0.0 2024-08-20 11:34:17,523 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 11:34:25,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-20 11:34:42,512 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4850, loss[loss=0.0866, beats_loss=0.01375, ecapa_loss=0.0001406, whisper_loss=0.07144, over 22271.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.08959, over 3848071.50 frames. ], batch size: 95, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:34:45,382 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 11:35:18,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4790380.0, ans=0.125 2024-08-20 11:35:20,934 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 18 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-20 11:35:27,413 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.360e+01 2.620e+01 2.940e+01 4.009e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-20 11:35:38,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4790480.0, ans=0.1 2024-08-20 11:36:11,065 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4790580.0, ans=0.2 2024-08-20 11:36:21,755 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 11:36:40,610 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 11:36:57,849 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4900, loss[loss=0.1078, beats_loss=0.01099, ecapa_loss=0.0001478, whisper_loss=0.09533, over 21360.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08966, over 3851515.19 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:37:20,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4790780.0, ans=0.125 2024-08-20 11:37:42,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4790880.0, ans=0.0 2024-08-20 11:37:58,907 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-20 11:38:24,269 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 11:38:24,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4791080.0, ans=0.09899494936611666 2024-08-20 11:38:46,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4791180.0, ans=0.1 2024-08-20 11:39:03,640 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4791180.0, ans=0.125 2024-08-20 11:39:11,009 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 4950, loss[loss=0.08675, beats_loss=0.01049, ecapa_loss=0.0001348, whisper_loss=0.07491, over 20716.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.09007, over 3899357.20 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:39:18,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4791280.0, ans=0.0 2024-08-20 11:39:27,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4791280.0, ans=0.04949747468305833 2024-08-20 11:39:46,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2024-08-20 11:39:54,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.340e+01 2.516e+01 2.750e+01 4.505e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-20 11:39:58,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4791380.0, ans=0.125 2024-08-20 11:40:00,850 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 11:40:18,496 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 11:40:28,045 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 11:40:41,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4791580.0, ans=0.125 2024-08-20 11:40:43,371 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-20 11:40:45,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4791580.0, ans=0.0 2024-08-20 11:41:03,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4791680.0, ans=0.07 2024-08-20 11:41:08,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4791680.0, ans=0.125 2024-08-20 11:41:10,269 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 11:41:16,283 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5000, loss[loss=0.07922, beats_loss=0.01277, ecapa_loss=0.0001418, whisper_loss=0.06503, over 20739.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.000139, whisper_loss=0.08971, over 3865811.77 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:41:50,551 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 11:42:54,273 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 11:43:18,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5050, loss[loss=0.09213, beats_loss=0.008767, ecapa_loss=0.0001329, whisper_loss=0.08203, over 14714.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.08955, over 3839740.26 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:43:59,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.312e+01 2.479e+01 2.896e+01 1.864e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-20 11:44:10,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4792480.0, ans=0.2 2024-08-20 11:44:16,310 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 11:44:23,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-08-20 11:44:26,264 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 15 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 11:44:32,986 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 29 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 11:44:53,557 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 11:45:16,443 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5100, loss[loss=0.07674, beats_loss=0.01225, ecapa_loss=0.0001304, whisper_loss=0.06319, over 15628.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001406, whisper_loss=0.08987, over 3810824.87 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:45:42,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4792880.0, ans=0.0 2024-08-20 11:46:14,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4792980.0, ans=0.035 2024-08-20 11:46:55,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4793180.0, ans=0.125 2024-08-20 11:47:19,412 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5150, loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001395, whisper_loss=0.08958, over 21351.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.09053, over 3810445.71 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:47:26,951 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 11:47:43,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4793380.0, ans=0.2 2024-08-20 11:47:50,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4793380.0, ans=0.125 2024-08-20 11:48:01,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.291e+01 2.565e+01 2.823e+01 3.881e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 11:48:08,919 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-20 11:48:35,873 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 11:48:40,289 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 11:48:54,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4793580.0, ans=0.125 2024-08-20 11:48:56,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4793680.0, ans=0.125 2024-08-20 11:48:58,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4793680.0, ans=0.0 2024-08-20 11:49:14,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4793680.0, ans=0.1 2024-08-20 11:49:23,405 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5200, loss[loss=0.09802, beats_loss=0.01304, ecapa_loss=0.000127, whisper_loss=0.08371, over 21281.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001398, whisper_loss=0.09041, over 3811381.35 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:49:46,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4793780.0, ans=0.2 2024-08-20 11:49:55,530 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 11:50:03,649 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 11:50:03,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4793880.0, ans=0.0 2024-08-20 11:50:17,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4793980.0, ans=0.0 2024-08-20 11:50:18,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4793980.0, ans=0.0 2024-08-20 11:50:28,088 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 11:50:35,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-08-20 11:50:57,127 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 11:50:57,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4794080.0, ans=0.0 2024-08-20 11:51:27,314 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5250, loss[loss=0.08939, beats_loss=0.01045, ecapa_loss=0.0001271, whisper_loss=0.07767, over 21838.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001388, whisper_loss=0.09029, over 3784022.10 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:51:29,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4794280.0, ans=0.1 2024-08-20 11:51:49,678 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4794280.0, ans=0.0 2024-08-20 11:51:59,185 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4794380.0, ans=0.125 2024-08-20 11:52:07,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4794380.0, ans=0.125 2024-08-20 11:52:10,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4794380.0, ans=0.2 2024-08-20 11:52:11,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.300e+01 2.574e+01 2.899e+01 1.239e+02, threshold=5.148e+01, percent-clipped=2.0 2024-08-20 11:52:15,334 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 11:52:18,189 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 11:53:24,201 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 11:53:31,839 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5300, loss[loss=0.09789, beats_loss=0.01075, ecapa_loss=0.000141, whisper_loss=0.08574, over 19045.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001381, whisper_loss=0.08965, over 3784171.48 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:53:36,473 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-20 11:54:02,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4794880.0, ans=0.0 2024-08-20 11:54:06,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-20 11:54:11,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4794880.0, ans=0.0 2024-08-20 11:54:33,916 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-20 11:54:38,154 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 11:54:54,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4795080.0, ans=0.1 2024-08-20 11:55:12,707 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-20 11:55:13,758 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 11:55:15,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4795180.0, ans=0.0 2024-08-20 11:55:28,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=12.0 2024-08-20 11:55:29,316 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5350, loss[loss=0.09594, beats_loss=0.009948, ecapa_loss=0.0001483, whisper_loss=0.08451, over 22432.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08961, over 3724708.73 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:55:47,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4795280.0, ans=0.125 2024-08-20 11:55:53,224 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-20 11:56:00,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4795380.0, ans=0.125 2024-08-20 11:56:10,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.312e+01 2.539e+01 2.746e+01 3.720e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-20 11:56:11,591 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 30 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 11:56:17,475 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 11:56:21,583 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 11:56:21,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4795480.0, ans=0.0 2024-08-20 11:57:04,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4795580.0, ans=0.125 2024-08-20 11:57:09,852 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2024-08-20 11:57:17,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4795680.0, ans=0.125 2024-08-20 11:57:32,250 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5400, loss[loss=0.09236, beats_loss=0.01098, ecapa_loss=0.0001127, whisper_loss=0.08025, over 15836.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001389, whisper_loss=0.08937, over 3713042.68 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:57:34,047 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 11:57:34,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4795780.0, ans=0.2 2024-08-20 11:57:39,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4795780.0, ans=0.2 2024-08-20 11:57:41,669 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 11:58:19,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4795980.0, ans=0.125 2024-08-20 11:59:08,790 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 11:59:34,833 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 11:59:35,898 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5450, loss[loss=0.1126, beats_loss=0.009256, ecapa_loss=0.0001362, whisper_loss=0.102, over 21312.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.08921, over 3735878.71 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:59:45,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4796280.0, ans=0.125 2024-08-20 11:59:45,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4796280.0, ans=0.125 2024-08-20 11:59:48,027 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 12:00:06,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4796380.0, ans=0.125 2024-08-20 12:00:09,202 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 12:00:10,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4796380.0, ans=0.1 2024-08-20 12:00:13,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-20 12:00:18,904 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.281e+01 2.461e+01 2.740e+01 4.925e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-20 12:00:22,817 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 12:01:43,298 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5500, loss[loss=0.09141, beats_loss=0.01229, ecapa_loss=8.328e-05, whisper_loss=0.07829, over 16131.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08897, over 3773481.59 frames. ], batch size: 60, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:02:24,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4796880.0, ans=0.125 2024-08-20 12:02:39,141 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 12:02:41,665 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 12:02:42,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4796980.0, ans=0.125 2024-08-20 12:03:19,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4797180.0, ans=0.125 2024-08-20 12:03:45,340 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5550, loss[loss=0.09677, beats_loss=0.01014, ecapa_loss=0.0001551, whisper_loss=0.08508, over 15793.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08907, over 3794473.58 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:03:59,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4797280.0, ans=0.5 2024-08-20 12:03:59,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4797280.0, ans=0.0 2024-08-20 12:04:07,783 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 40 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 12:04:26,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4797380.0, ans=0.125 2024-08-20 12:04:31,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.426e+01 2.728e+01 7.340e+01, threshold=4.852e+01, percent-clipped=2.0 2024-08-20 12:04:46,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4797480.0, ans=0.125 2024-08-20 12:04:55,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4797480.0, ans=0.125 2024-08-20 12:05:17,288 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.726e-01 2024-08-20 12:05:53,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4797780.0, ans=0.1 2024-08-20 12:05:54,002 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5600, loss[loss=0.09724, beats_loss=0.009743, ecapa_loss=0.0001528, whisper_loss=0.08597, over 16808.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001404, whisper_loss=0.08966, over 3803559.07 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:06:02,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4797780.0, ans=0.125 2024-08-20 12:06:21,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4797880.0, ans=0.125 2024-08-20 12:06:59,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-08-20 12:07:34,259 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 12:07:46,988 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 12:07:49,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4798180.0, ans=0.0 2024-08-20 12:08:08,260 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5650, loss[loss=0.1194, beats_loss=0.009514, ecapa_loss=0.0001218, whisper_loss=0.1087, over 18706.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001417, whisper_loss=0.08986, over 3830756.34 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:08:51,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4798380.0, ans=0.125 2024-08-20 12:08:51,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.277e+01 2.490e+01 2.736e+01 3.914e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-20 12:08:53,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4798380.0, ans=0.0 2024-08-20 12:09:09,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4798480.0, ans=0.125 2024-08-20 12:09:17,794 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 17 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 12:09:52,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4798680.0, ans=0.125 2024-08-20 12:10:09,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-20 12:10:10,233 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5700, loss[loss=0.08865, beats_loss=0.0121, ecapa_loss=0.0001322, whisper_loss=0.07523, over 21936.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.08932, over 3815610.27 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:10:18,875 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 17 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-20 12:10:23,241 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 20 from LS+wenet, 35 from Vox, 28 fro AS 2024-08-20 12:10:53,457 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 14 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 12:11:10,400 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 12:11:40,091 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 12:12:09,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4799280.0, ans=0.0 2024-08-20 12:12:10,002 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5750, loss[loss=0.1051, beats_loss=0.01085, ecapa_loss=0.0001318, whisper_loss=0.09292, over 22395.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001414, whisper_loss=0.089, over 3797644.24 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:12:29,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4799280.0, ans=0.125 2024-08-20 12:12:32,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4799280.0, ans=0.0 2024-08-20 12:12:38,392 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 12:12:51,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.328e+01 2.562e+01 2.759e+01 3.925e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-20 12:12:53,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4799380.0, ans=0.125 2024-08-20 12:12:57,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4799480.0, ans=0.125 2024-08-20 12:13:27,299 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 12:13:35,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4799580.0, ans=0.125 2024-08-20 12:13:53,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4799680.0, ans=0.125 2024-08-20 12:13:58,081 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 18 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-20 12:14:00,429 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 14 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 12:14:10,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4799680.0, ans=0.0 2024-08-20 12:14:13,770 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5800, loss[loss=0.09561, beats_loss=0.01231, ecapa_loss=0.0001365, whisper_loss=0.08194, over 23299.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08906, over 3820649.00 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:14:15,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4799780.0, ans=0.0 2024-08-20 12:14:39,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4799880.0, ans=0.125 2024-08-20 12:14:58,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4799880.0, ans=0.0 2024-08-20 12:15:08,147 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 12:15:46,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4800080.0, ans=0.95 2024-08-20 12:16:06,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4800180.0, ans=0.125 2024-08-20 12:16:16,510 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 12:16:18,530 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5850, loss[loss=0.1063, beats_loss=0.008228, ecapa_loss=0.000178, whisper_loss=0.09633, over 14806.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.000141, whisper_loss=0.08926, over 3808968.51 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:16:18,773 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 10 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 12:16:57,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4800380.0, ans=0.125 2024-08-20 12:16:58,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.301e+01 2.506e+01 2.861e+01 6.399e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 12:17:01,209 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 12:17:17,852 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 15 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 12:17:36,221 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 12:17:39,088 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 12:18:16,848 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5900, loss[loss=0.09652, beats_loss=0.009891, ecapa_loss=0.0001291, whisper_loss=0.08533, over 16069.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.08897, over 3762284.46 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:18:21,800 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2024-08-20 12:19:04,109 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.23 vs. limit=22.5 2024-08-20 12:19:19,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-20 12:19:20,131 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 12:19:23,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4800980.0, ans=0.1 2024-08-20 12:19:27,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4801080.0, ans=0.125 2024-08-20 12:19:40,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4801080.0, ans=0.125 2024-08-20 12:19:43,942 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 12:19:52,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4801180.0, ans=0.0 2024-08-20 12:19:53,250 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 12:19:55,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2024-08-20 12:20:15,125 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 5950, loss[loss=0.1006, beats_loss=0.007438, ecapa_loss=0.0001541, whisper_loss=0.09157, over 17271.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08848, over 3766667.26 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:20:33,256 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4801280.0, ans=0.1 2024-08-20 12:20:36,814 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-20 12:20:55,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.276e+01 2.504e+01 2.875e+01 3.990e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-20 12:20:58,222 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 12:21:00,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4801480.0, ans=0.2 2024-08-20 12:21:02,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4801480.0, ans=0.1 2024-08-20 12:21:18,742 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 12:21:19,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4801480.0, ans=0.125 2024-08-20 12:21:32,755 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-20 12:21:40,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4801580.0, ans=0.125 2024-08-20 12:21:50,827 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4801680.0, ans=0.125 2024-08-20 12:21:55,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4801680.0, ans=0.0 2024-08-20 12:22:09,487 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6000, loss[loss=0.1074, beats_loss=0.01079, ecapa_loss=0.000146, whisper_loss=0.09515, over 19212.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001407, whisper_loss=0.08898, over 3779069.22 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:22:09,488 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 12:22:45,716 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 12:23:08,962 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on SV_voxceleb1: loss=0.003913, beats_loss=0, ecapa_loss=0.0003913, whisper_loss=0, over 944235.00 frames. 2024-08-20 12:23:34,351 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([6.7224e-04, 2.6472e-03, 2.6717e-03, 3.9278e+00, 2.7307e-03, 4.2884e-02, 2.8652e-03, 1.4492e-02], device='cuda:2') 2024-08-20 12:23:58,658 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1245, 3.9722, 3.5130, 3.8883], device='cuda:2') 2024-08-20 12:24:09,088 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4113, 2.7363, 3.0196, 2.7123], device='cuda:2') 2024-08-20 12:24:44,171 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 12:24:44,175 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 12:24:52,698 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-20 12:25:48,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4802080.0, ans=0.125 2024-08-20 12:25:53,296 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4802080.0, ans=0.125 2024-08-20 12:26:23,412 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 12:26:23,736 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:26:29,622 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-20 12:26:33,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4802180.0, ans=0.125 2024-08-20 12:26:36,937 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6050, loss[loss=0.09261, beats_loss=0.0099, ecapa_loss=0.0001371, whisper_loss=0.08134, over 13217.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08897, over 3782262.10 frames. ], batch size: 50, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:26:56,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4802280.0, ans=0.125 2024-08-20 12:27:05,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4802380.0, ans=0.125 2024-08-20 12:27:17,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.267e+01 2.529e+01 2.771e+01 4.356e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 12:27:19,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2024-08-20 12:27:44,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4802480.0, ans=0.2 2024-08-20 12:28:15,984 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 29 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 12:28:25,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4802680.0, ans=0.2 2024-08-20 12:28:34,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4802780.0, ans=0.125 2024-08-20 12:28:35,571 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6100, loss[loss=0.1117, beats_loss=0.01077, ecapa_loss=0.0001279, whisper_loss=0.09961, over 22521.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.0898, over 3798853.65 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:28:44,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4802780.0, ans=0.09899494936611666 2024-08-20 12:28:45,225 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2024-08-20 12:28:52,378 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 12:29:04,339 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 12:29:16,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4802880.0, ans=0.125 2024-08-20 12:29:20,091 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=12.0 2024-08-20 12:29:21,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4802980.0, ans=0.125 2024-08-20 12:30:06,508 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4803180.0, ans=0.0 2024-08-20 12:30:21,466 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 12:30:29,504 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6150, loss[loss=0.1075, beats_loss=0.01123, ecapa_loss=0.0001411, whisper_loss=0.09483, over 13603.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001387, whisper_loss=0.08963, over 3816077.73 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:30:46,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4803280.0, ans=0.0 2024-08-20 12:31:03,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=15.0 2024-08-20 12:31:06,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4803380.0, ans=0.1 2024-08-20 12:31:07,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.457e+01 2.787e+01 2.276e+02, threshold=4.913e+01, percent-clipped=4.0 2024-08-20 12:31:40,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2024-08-20 12:31:40,791 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 19 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-20 12:31:45,728 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 12:32:02,451 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2024-08-20 12:32:26,734 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6200, loss[loss=0.1051, beats_loss=0.009889, ecapa_loss=0.0001561, whisper_loss=0.0936, over 15006.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001396, whisper_loss=0.08936, over 3786363.35 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:32:35,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4803780.0, ans=0.0 2024-08-20 12:33:06,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4803880.0, ans=0.2 2024-08-20 12:33:07,591 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 12:33:09,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-08-20 12:34:02,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4804180.0, ans=0.125 2024-08-20 12:34:13,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4804180.0, ans=0.1 2024-08-20 12:34:21,876 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6250, loss[loss=0.1088, beats_loss=0.01147, ecapa_loss=0.0001457, whisper_loss=0.09588, over 22387.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001403, whisper_loss=0.09052, over 3832032.02 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:34:33,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4804280.0, ans=0.125 2024-08-20 12:34:43,992 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-20 12:34:51,048 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 12:35:02,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.372e+01 2.624e+01 2.911e+01 4.545e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 12:35:13,763 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 12:35:19,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4804480.0, ans=0.125 2024-08-20 12:35:24,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4804480.0, ans=0.125 2024-08-20 12:35:38,167 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 12:36:01,851 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 12:36:09,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4804680.0, ans=0.125 2024-08-20 12:36:17,588 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6300, loss[loss=0.09969, beats_loss=0.01125, ecapa_loss=0.0001364, whisper_loss=0.08708, over 21611.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.09013, over 3823207.68 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:36:22,696 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 12:36:30,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4804780.0, ans=0.0 2024-08-20 12:36:57,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2024-08-20 12:37:21,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4804980.0, ans=0.125 2024-08-20 12:37:23,008 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 12:38:13,161 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6350, loss[loss=0.0989, beats_loss=0.01124, ecapa_loss=0.0001273, whisper_loss=0.08638, over 15299.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.09021, over 3828634.82 frames. ], batch size: 63, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:38:42,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4805380.0, ans=0.125 2024-08-20 12:38:53,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.243e+01 2.449e+01 2.846e+01 7.911e+01, threshold=4.899e+01, percent-clipped=2.0 2024-08-20 12:39:16,078 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 12:39:37,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4805580.0, ans=0.0 2024-08-20 12:39:39,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4805580.0, ans=0.2 2024-08-20 12:39:56,700 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4805680.0, ans=0.125 2024-08-20 12:40:15,140 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6400, loss[loss=0.08872, beats_loss=0.01188, ecapa_loss=0.0001498, whisper_loss=0.07534, over 23239.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.09006, over 3840629.82 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:40:25,216 WARNING [optim.py:496] (2/4) Scaling gradients by 0.015770763158798218, model_norm_threshold=48.98588180541992 2024-08-20 12:40:25,384 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.332e+06, grad_sumsq=1.479e+05, orig_rms_sq=9.003e+00 2024-08-20 12:40:43,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4805880.0, ans=0.0 2024-08-20 12:41:09,855 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 12:41:56,800 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 12:42:10,954 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6450, loss[loss=0.09647, beats_loss=0.01142, ecapa_loss=0.0001211, whisper_loss=0.08384, over 15008.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08997, over 3823416.09 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:42:27,412 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 12:42:58,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.273e+01 2.543e+01 2.928e+01 3.106e+03, threshold=5.086e+01, percent-clipped=1.0 2024-08-20 12:43:00,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4806380.0, ans=0.2 2024-08-20 12:43:08,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4806480.0, ans=0.0 2024-08-20 12:43:14,734 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 15 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-20 12:43:18,443 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07709788531064987, model_norm_threshold=50.86475372314453 2024-08-20 12:43:18,613 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.952e+04, grad_sumsq=6.952e+04, orig_rms_sq=1.000e+00 2024-08-20 12:43:23,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4806480.0, ans=0.1 2024-08-20 12:43:33,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4806580.0, ans=0.0 2024-08-20 12:43:37,584 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-20 12:43:40,454 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 12:43:41,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4806580.0, ans=0.1 2024-08-20 12:44:01,789 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 12:44:07,339 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6500, loss[loss=0.1118, beats_loss=0.01001, ecapa_loss=0.0001151, whisper_loss=0.1007, over 19084.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.09003, over 3807397.54 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:44:13,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4806780.0, ans=0.5 2024-08-20 12:44:20,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-20 12:44:38,842 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 18 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 12:44:52,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4806980.0, ans=0.125 2024-08-20 12:44:55,977 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 15 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 12:45:08,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4807080.0, ans=0.0 2024-08-20 12:45:14,386 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 12:45:16,427 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 12:45:34,518 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.309e-01 2024-08-20 12:45:35,930 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 12:45:40,564 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6550, loss[loss=0.1069, beats_loss=0.008852, ecapa_loss=0.0001361, whisper_loss=0.09672, over 15232.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001402, whisper_loss=0.08985, over 3814212.27 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:45:52,208 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 12:45:53,893 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 37 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 12:45:56,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4807380.0, ans=0.125 2024-08-20 12:46:05,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4807380.0, ans=0.125 2024-08-20 12:46:09,840 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.06 vs. limit=15.0 2024-08-20 12:46:10,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.299e+01 2.491e+01 2.817e+01 6.597e+02, threshold=4.982e+01, percent-clipped=1.0 2024-08-20 12:46:27,911 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 12:46:38,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4807480.0, ans=0.0 2024-08-20 12:47:24,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2024-08-20 12:47:29,579 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 12:47:32,730 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6600, loss[loss=0.08474, beats_loss=0.01011, ecapa_loss=0.0001418, whisper_loss=0.07321, over 16009.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.09034, over 3826680.38 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:47:37,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-08-20 12:48:08,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4807880.0, ans=0.125 2024-08-20 12:49:30,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4808180.0, ans=0.125 2024-08-20 12:49:36,564 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6650, loss[loss=0.1146, beats_loss=0.01058, ecapa_loss=0.00014, whisper_loss=0.1026, over 19435.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09051, over 3826986.03 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:49:45,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4808280.0, ans=0.04949747468305833 2024-08-20 12:50:10,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4808380.0, ans=0.125 2024-08-20 12:50:15,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.590e+01 2.324e+01 2.596e+01 3.081e+01 4.862e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-20 12:51:19,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4808680.0, ans=0.125 2024-08-20 12:51:21,075 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-20 12:51:32,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4808780.0, ans=0.0 2024-08-20 12:51:33,289 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6700, loss[loss=0.1022, beats_loss=0.008841, ecapa_loss=0.0001269, whisper_loss=0.09214, over 13642.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.09121, over 3832361.97 frames. ], batch size: 51, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:52:01,786 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 12:52:23,835 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 12:52:25,738 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 13 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 12:52:27,505 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 12:52:36,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4809080.0, ans=0.1 2024-08-20 12:52:41,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4809080.0, ans=0.0 2024-08-20 12:53:05,835 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6750, loss[loss=0.09566, beats_loss=0.01036, ecapa_loss=0.000149, whisper_loss=0.08381, over 15763.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001402, whisper_loss=0.09103, over 3846756.21 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:53:07,998 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 12:53:13,486 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 12:53:20,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4809280.0, ans=0.125 2024-08-20 12:53:35,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.304e+01 2.494e+01 2.775e+01 4.602e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-20 12:53:35,996 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-20 12:53:48,264 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=22.5 2024-08-20 12:53:54,637 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 12:54:04,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-20 12:54:08,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4809580.0, ans=0.0 2024-08-20 12:54:25,631 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 12:54:32,338 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6800, loss[loss=0.1352, beats_loss=0.007853, ecapa_loss=0.0001548, whisper_loss=0.1258, over 20241.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01027, ecapa_loss=0.0001403, whisper_loss=0.09123, over 3831129.55 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:54:41,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4809780.0, ans=0.1 2024-08-20 12:54:57,314 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 25 from LS+wenet, 7 from Vox, 29 fro AS 2024-08-20 12:55:03,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4809880.0, ans=0.125 2024-08-20 12:55:14,174 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 35 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 12:55:28,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4810080.0, ans=0.05 2024-08-20 12:55:31,256 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 12:55:55,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4810180.0, ans=0.125 2024-08-20 12:55:59,939 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6850, loss[loss=0.09272, beats_loss=0.01011, ecapa_loss=0.0001371, whisper_loss=0.08124, over 16304.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001405, whisper_loss=0.09171, over 3827257.09 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:56:10,286 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 12 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 12:56:19,009 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 12:56:28,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.376e+01 2.516e+01 2.843e+01 1.582e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 12:56:38,697 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 12:56:38,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4810480.0, ans=0.125 2024-08-20 12:56:44,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4810480.0, ans=0.0 2024-08-20 12:56:49,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4810480.0, ans=0.0 2024-08-20 12:56:49,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4810480.0, ans=0.125 2024-08-20 12:57:11,897 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 12:57:25,375 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 12:57:29,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4810780.0, ans=0.0 2024-08-20 12:57:30,362 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6900, loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.0001466, whisper_loss=0.08917, over 20127.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09165, over 3834650.95 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:57:30,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4810780.0, ans=0.0 2024-08-20 12:57:45,180 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-20 12:57:45,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4810780.0, ans=0.1 2024-08-20 12:58:02,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-08-20 12:58:06,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-20 12:58:25,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4811080.0, ans=0.1 2024-08-20 12:58:31,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4811080.0, ans=0.0 2024-08-20 12:58:38,976 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 12:58:54,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4811180.0, ans=0.09899494936611666 2024-08-20 12:58:57,828 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 12:58:59,273 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 6950, loss[loss=0.1042, beats_loss=0.007133, ecapa_loss=0.0001315, whisper_loss=0.09578, over 17383.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.09039, over 3850749.46 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:58:59,516 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 12:59:09,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4811280.0, ans=0.125 2024-08-20 12:59:24,653 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 12:59:30,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.235e+01 2.363e+01 2.831e+01 5.596e+01, threshold=4.726e+01, percent-clipped=1.0 2024-08-20 12:59:49,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4811480.0, ans=0.125 2024-08-20 12:59:55,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4811580.0, ans=0.1 2024-08-20 13:00:19,514 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-08-20 13:00:29,827 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7000, loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001294, whisper_loss=0.08929, over 19342.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.08972, over 3818077.70 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:00:38,794 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.339e+01 2024-08-20 13:00:51,379 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 13:00:57,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4811880.0, ans=0.125 2024-08-20 13:00:57,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2024-08-20 13:01:01,504 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.806e-01 2024-08-20 13:01:28,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4812080.0, ans=0.0 2024-08-20 13:01:29,472 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 13:01:29,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4812080.0, ans=0.125 2024-08-20 13:01:46,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4812180.0, ans=0.125 2024-08-20 13:01:50,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4812180.0, ans=0.2 2024-08-20 13:01:53,319 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 13:01:56,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4812180.0, ans=0.0 2024-08-20 13:01:59,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7050, loss[loss=0.08805, beats_loss=0.008611, ecapa_loss=0.0001601, whisper_loss=0.07784, over 14266.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.08994, over 3812162.38 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:02:06,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4812280.0, ans=0.125 2024-08-20 13:02:09,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4812280.0, ans=0.125 2024-08-20 13:02:21,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2024-08-20 13:02:24,725 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 13:02:31,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.241e+01 2.463e+01 2.779e+01 3.668e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-20 13:02:35,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-20 13:02:38,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4812480.0, ans=0.125 2024-08-20 13:02:57,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4812580.0, ans=0.2 2024-08-20 13:02:57,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4812580.0, ans=0.2 2024-08-20 13:02:58,614 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4812580.0, ans=0.2 2024-08-20 13:03:00,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4812580.0, ans=0.0 2024-08-20 13:03:31,199 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7100, loss[loss=0.116, beats_loss=0.009499, ecapa_loss=0.0001363, whisper_loss=0.1051, over 21493.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001385, whisper_loss=0.08974, over 3831506.41 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:03:35,608 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 13:03:55,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4812880.0, ans=0.0 2024-08-20 13:04:10,621 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4812980.0, ans=0.1 2024-08-20 13:04:20,108 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 13:04:28,272 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 13:04:30,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4813080.0, ans=0.125 2024-08-20 13:04:39,124 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-20 13:04:40,624 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4813080.0, ans=0.125 2024-08-20 13:04:43,374 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 13:05:04,633 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7150, loss[loss=0.101, beats_loss=0.0108, ecapa_loss=0.0001033, whisper_loss=0.08913, over 16356.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.08959, over 3825782.09 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:05:36,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.222e+01 2.476e+01 2.874e+01 3.378e+02, threshold=4.952e+01, percent-clipped=1.0 2024-08-20 13:05:40,050 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 13:06:00,010 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 13:06:07,698 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 13:06:17,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4813680.0, ans=0.125 2024-08-20 13:06:34,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4813780.0, ans=0.0 2024-08-20 13:06:36,018 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7200, loss[loss=0.09487, beats_loss=0.01191, ecapa_loss=0.0001183, whisper_loss=0.08178, over 22053.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001391, whisper_loss=0.08911, over 3857997.27 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:06:49,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2024-08-20 13:07:09,280 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 13:07:24,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4813980.0, ans=10.0 2024-08-20 13:07:56,320 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09890901297330856, model_norm_threshold=49.522193908691406 2024-08-20 13:07:56,486 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.024e+04, grad_sumsq=9.178e+03, orig_rms_sq=3.294e+00 2024-08-20 13:07:56,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4814180.0, ans=0.2 2024-08-20 13:08:08,123 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 13:08:10,562 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7250, loss[loss=0.1069, beats_loss=0.008834, ecapa_loss=0.0001473, whisper_loss=0.09656, over 17496.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001387, whisper_loss=0.08944, over 3824211.32 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:08:25,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4814280.0, ans=0.125 2024-08-20 13:08:41,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.256e+01 2.635e+01 2.926e+01 5.007e+02, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 13:08:53,223 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4814480.0, ans=0.125 2024-08-20 13:09:30,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4814680.0, ans=0.0 2024-08-20 13:09:39,710 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7300, loss[loss=0.102, beats_loss=0.01129, ecapa_loss=0.0001442, whisper_loss=0.08925, over 22552.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.0898, over 3818324.09 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:10:02,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4814880.0, ans=0.125 2024-08-20 13:10:06,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4814880.0, ans=0.125 2024-08-20 13:10:09,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4814880.0, ans=0.5 2024-08-20 13:10:13,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4814980.0, ans=0.125 2024-08-20 13:10:18,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4814980.0, ans=0.0 2024-08-20 13:10:27,593 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4814980.0, ans=0.125 2024-08-20 13:10:30,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4814980.0, ans=0.05 2024-08-20 13:10:35,675 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4815080.0, ans=0.0 2024-08-20 13:10:54,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4815080.0, ans=0.1 2024-08-20 13:11:10,967 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-20 13:11:17,557 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7350, loss[loss=0.091, beats_loss=0.01241, ecapa_loss=0.0001281, whisper_loss=0.07731, over 21326.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08974, over 3827732.96 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:11:23,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4815280.0, ans=0.125 2024-08-20 13:11:29,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=12.0 2024-08-20 13:11:50,111 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 13:11:50,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4815380.0, ans=0.0 2024-08-20 13:11:51,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4815380.0, ans=0.125 2024-08-20 13:11:52,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.308e+01 2.500e+01 2.769e+01 3.790e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-20 13:11:54,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4815380.0, ans=0.125 2024-08-20 13:11:56,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4815380.0, ans=0.2 2024-08-20 13:13:01,532 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7400, loss[loss=0.1087, beats_loss=0.009586, ecapa_loss=0.0001692, whisper_loss=0.09741, over 15265.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001411, whisper_loss=0.09015, over 3844750.03 frames. ], batch size: 65, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:13:02,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=4815780.0, ans=0.2 2024-08-20 13:13:04,124 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 13:13:19,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4815780.0, ans=10.0 2024-08-20 13:13:49,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4815980.0, ans=0.125 2024-08-20 13:14:04,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4816080.0, ans=0.0 2024-08-20 13:14:14,303 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-20 13:14:38,169 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7450, loss[loss=0.09962, beats_loss=0.008245, ecapa_loss=0.0001901, whisper_loss=0.08947, over 15497.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01024, ecapa_loss=0.0001426, whisper_loss=0.09073, over 3821790.15 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:14:46,895 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:15:07,983 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 13:15:11,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.199e+01 2.442e+01 2.667e+01 5.088e+01, threshold=4.883e+01, percent-clipped=1.0 2024-08-20 13:15:21,594 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 21 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-20 13:15:39,390 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 13:15:58,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4816680.0, ans=0.125 2024-08-20 13:16:04,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4816680.0, ans=0.125 2024-08-20 13:16:09,832 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 23 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-20 13:16:19,362 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7500, loss[loss=0.07923, beats_loss=0.01318, ecapa_loss=9.59e-05, whisper_loss=0.06509, over 17189.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01017, ecapa_loss=0.0001414, whisper_loss=0.09072, over 3769734.22 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:16:22,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4816780.0, ans=0.0 2024-08-20 13:16:22,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4816780.0, ans=0.125 2024-08-20 13:16:24,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4816780.0, ans=0.0 2024-08-20 13:16:34,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4816780.0, ans=0.1 2024-08-20 13:16:46,732 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4816880.0, ans=0.0 2024-08-20 13:16:46,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4816880.0, ans=0.125 2024-08-20 13:16:52,975 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 40 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 13:17:00,077 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 13:18:01,089 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7550, loss[loss=0.1024, beats_loss=0.009852, ecapa_loss=0.0001268, whisper_loss=0.09132, over 18366.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01018, ecapa_loss=0.0001403, whisper_loss=0.09088, over 3781407.07 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:18:03,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4817280.0, ans=0.125 2024-08-20 13:18:07,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4817280.0, ans=0.05 2024-08-20 13:18:13,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4817280.0, ans=0.125 2024-08-20 13:18:20,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2024-08-20 13:18:34,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.221e+01 2.519e+01 2.793e+01 1.462e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-20 13:18:38,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4817380.0, ans=0.0 2024-08-20 13:18:49,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4817480.0, ans=0.125 2024-08-20 13:18:53,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4817480.0, ans=0.125 2024-08-20 13:19:15,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4817580.0, ans=0.125 2024-08-20 13:19:27,163 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4817680.0, ans=0.2 2024-08-20 13:19:42,580 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7600, loss[loss=0.09524, beats_loss=0.01168, ecapa_loss=0.0001598, whisper_loss=0.08196, over 21360.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001406, whisper_loss=0.09097, over 3802285.59 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:19:51,619 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 13:19:58,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4817780.0, ans=0.1 2024-08-20 13:20:14,246 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:20:17,060 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 13:20:39,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4818080.0, ans=0.0 2024-08-20 13:21:19,815 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7650, loss[loss=0.08158, beats_loss=0.007709, ecapa_loss=0.0001452, whisper_loss=0.07242, over 14209.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01021, ecapa_loss=0.000142, whisper_loss=0.09032, over 3780041.88 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:21:30,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4818280.0, ans=0.0 2024-08-20 13:21:32,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-08-20 13:21:52,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.373e+01 2.628e+01 2.994e+01 5.178e+01, threshold=5.256e+01, percent-clipped=1.0 2024-08-20 13:22:08,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4818480.0, ans=0.125 2024-08-20 13:22:17,429 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 13:22:22,309 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4818580.0, ans=15.0 2024-08-20 13:22:46,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4818680.0, ans=0.2 2024-08-20 13:22:48,503 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 13:22:57,176 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7700, loss[loss=0.1182, beats_loss=0.01038, ecapa_loss=0.0001262, whisper_loss=0.1065, over 22339.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01022, ecapa_loss=0.0001413, whisper_loss=0.09024, over 3742459.29 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:23:00,401 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4818780.0, ans=0.125 2024-08-20 13:23:34,174 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 13:24:09,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.46 vs. limit=22.5 2024-08-20 13:24:19,399 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 13:24:25,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4819180.0, ans=0.0 2024-08-20 13:24:26,758 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4819180.0, ans=0.125 2024-08-20 13:24:39,019 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7750, loss[loss=0.09542, beats_loss=0.01094, ecapa_loss=0.0001853, whisper_loss=0.08263, over 20485.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001405, whisper_loss=0.09014, over 3776211.21 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:24:40,957 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-20 13:24:47,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4819280.0, ans=0.1 2024-08-20 13:24:48,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4819280.0, ans=0.125 2024-08-20 13:25:07,143 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-20 13:25:08,374 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 13:25:16,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.228e+01 2.412e+01 2.689e+01 6.555e+01, threshold=4.823e+01, percent-clipped=1.0 2024-08-20 13:25:21,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4819480.0, ans=0.125 2024-08-20 13:25:28,801 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 14 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 13:25:46,419 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 13:26:17,572 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7800, loss[loss=0.1206, beats_loss=0.01195, ecapa_loss=8.232e-05, whisper_loss=0.1078, over 23386.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.09035, over 3794367.33 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:27:21,870 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 13:27:24,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4820080.0, ans=0.125 2024-08-20 13:27:34,140 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 13:27:52,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4820180.0, ans=0.0 2024-08-20 13:27:58,226 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7850, loss[loss=0.09921, beats_loss=0.01014, ecapa_loss=0.0001314, whisper_loss=0.08775, over 22072.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001377, whisper_loss=0.09027, over 3824275.15 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:27:58,471 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 13:28:04,455 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4820280.0, ans=0.2 2024-08-20 13:28:04,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4820280.0, ans=0.125 2024-08-20 13:28:25,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-20 13:28:34,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.297e+01 2.493e+01 2.759e+01 4.989e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-20 13:28:42,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-20 13:28:56,031 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 13 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 13:29:30,146 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 13:29:40,348 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7900, loss[loss=0.08913, beats_loss=0.01086, ecapa_loss=0.000147, whisper_loss=0.0768, over 14916.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001378, whisper_loss=0.09084, over 3831745.09 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:29:49,187 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2024-08-20 13:30:10,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=15.0 2024-08-20 13:30:21,279 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 13:31:06,123 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2024-08-20 13:31:19,936 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 7950, loss[loss=0.1073, beats_loss=0.009605, ecapa_loss=0.0001364, whisper_loss=0.09628, over 20288.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.000138, whisper_loss=0.09102, over 3844448.96 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:31:22,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4821280.0, ans=0.125 2024-08-20 13:31:26,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4821280.0, ans=0.0 2024-08-20 13:31:40,114 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 13:31:56,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.297e+01 2.495e+01 2.761e+01 3.642e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 13:32:06,813 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 13:32:19,525 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 13:32:47,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2024-08-20 13:32:48,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4821680.0, ans=0.125 2024-08-20 13:32:55,136 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8000, loss[loss=0.09485, beats_loss=0.01458, ecapa_loss=7.217e-05, whisper_loss=0.07956, over 15323.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001364, whisper_loss=0.09059, over 3849807.62 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:32:56,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4821780.0, ans=0.125 2024-08-20 13:33:06,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4821780.0, ans=0.1 2024-08-20 13:33:09,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4821780.0, ans=0.125 2024-08-20 13:33:10,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4821780.0, ans=6.0 2024-08-20 13:33:48,924 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 17 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 13:33:56,869 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 13:34:06,608 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 18 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 13:34:23,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4822180.0, ans=0.125 2024-08-20 13:34:32,036 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8050, loss[loss=0.1118, beats_loss=0.01252, ecapa_loss=0.0001016, whisper_loss=0.09825, over 24753.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001376, whisper_loss=0.09097, over 3845498.16 frames. ], batch size: 96, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:35:07,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.231e+01 2.467e+01 2.722e+01 2.720e+02, threshold=4.934e+01, percent-clipped=1.0 2024-08-20 13:35:11,133 WARNING [optim.py:496] (2/4) Scaling gradients by 0.020375000312924385, model_norm_threshold=49.342281341552734 2024-08-20 13:35:11,302 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.714e+05, grad_sumsq=7.714e+05, orig_rms_sq=1.000e+00 2024-08-20 13:35:18,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4822480.0, ans=0.1 2024-08-20 13:35:39,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4822580.0, ans=0.0 2024-08-20 13:35:54,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4822680.0, ans=0.125 2024-08-20 13:35:57,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4822680.0, ans=0.0 2024-08-20 13:36:03,729 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4822680.0, ans=0.2 2024-08-20 13:36:10,108 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8100, loss[loss=0.07769, beats_loss=0.01057, ecapa_loss=0.0001479, whisper_loss=0.06564, over 18663.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001375, whisper_loss=0.09074, over 3782896.59 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:36:20,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2024-08-20 13:36:27,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4822880.0, ans=0.125 2024-08-20 13:36:52,096 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 13:37:10,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4823080.0, ans=15.0 2024-08-20 13:37:12,866 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=12.0 2024-08-20 13:37:29,962 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 13:37:49,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4823280.0, ans=0.125 2024-08-20 13:37:50,174 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8150, loss[loss=0.1058, beats_loss=0.01, ecapa_loss=0.000134, whisper_loss=0.0945, over 19961.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.09092, over 3799008.39 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:38:22,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4823380.0, ans=0.125 2024-08-20 13:38:23,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.223e+01 2.514e+01 2.822e+01 2.422e+03, threshold=5.028e+01, percent-clipped=2.0 2024-08-20 13:38:30,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4823480.0, ans=0.1 2024-08-20 13:38:50,242 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.050e+05 2024-08-20 13:38:52,277 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 36 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 13:39:17,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4823680.0, ans=0.125 2024-08-20 13:39:26,039 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8200, loss[loss=0.08916, beats_loss=0.01019, ecapa_loss=0.0001449, whisper_loss=0.07752, over 22808.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001375, whisper_loss=0.09072, over 3807513.69 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:39:32,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4823780.0, ans=0.0 2024-08-20 13:39:43,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4823780.0, ans=0.0 2024-08-20 13:40:12,498 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 13:40:13,872 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=15.0 2024-08-20 13:40:17,351 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-20 13:40:36,637 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 13:40:41,188 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 13:40:42,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4824080.0, ans=0.1 2024-08-20 13:40:50,152 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 13:40:53,934 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 13:41:02,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4824180.0, ans=0.07 2024-08-20 13:41:04,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4824280.0, ans=0.125 2024-08-20 13:41:05,392 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8250, loss[loss=0.09955, beats_loss=0.009057, ecapa_loss=0.0001388, whisper_loss=0.08911, over 22566.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001374, whisper_loss=0.091, over 3822183.83 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:41:07,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.71 vs. limit=10.0 2024-08-20 13:41:26,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4824380.0, ans=0.125 2024-08-20 13:41:28,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4824380.0, ans=0.125 2024-08-20 13:41:31,571 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 14 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 13:41:40,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.279e+01 2.520e+01 2.852e+01 4.224e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-20 13:41:46,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-20 13:42:10,388 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-20 13:42:25,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4824680.0, ans=0.125 2024-08-20 13:42:44,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8300, loss[loss=0.09098, beats_loss=0.008572, ecapa_loss=0.000163, whisper_loss=0.08078, over 15034.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001391, whisper_loss=0.09082, over 3786377.92 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:43:16,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-20 13:43:25,100 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2024-08-20 13:43:28,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4824980.0, ans=0.125 2024-08-20 13:43:40,364 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 13:43:42,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4824980.0, ans=0.1 2024-08-20 13:43:57,844 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 13:43:59,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4825080.0, ans=0.2 2024-08-20 13:44:17,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4825180.0, ans=0.0 2024-08-20 13:44:23,242 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8350, loss[loss=0.09141, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.07952, over 16383.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.09032, over 3759936.48 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:44:42,503 WARNING [optim.py:496] (2/4) Scaling gradients by 0.025893952697515488, model_norm_threshold=50.39912796020508 2024-08-20 13:44:42,674 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.172e+05, grad_sumsq=8.560e+07, orig_rms_sq=1.071e-02 2024-08-20 13:44:58,393 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.380e+01 2.657e+01 2.993e+01 1.946e+03, threshold=5.314e+01, percent-clipped=1.0 2024-08-20 13:44:59,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4825380.0, ans=0.07 2024-08-20 13:45:01,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4825480.0, ans=0.125 2024-08-20 13:45:57,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4825680.0, ans=0.0 2024-08-20 13:46:02,260 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8400, loss[loss=0.1047, beats_loss=0.009533, ecapa_loss=0.0001242, whisper_loss=0.09388, over 23371.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.09007, over 3770357.57 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:46:30,284 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-08-20 13:46:45,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4825980.0, ans=0.0 2024-08-20 13:46:50,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4825980.0, ans=0.09899494936611666 2024-08-20 13:46:55,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4825980.0, ans=0.0 2024-08-20 13:47:06,630 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 13:47:20,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4826180.0, ans=0.0 2024-08-20 13:47:22,055 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 13:47:28,398 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 13:47:40,628 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8450, loss[loss=0.1096, beats_loss=0.01002, ecapa_loss=0.0001459, whisper_loss=0.09815, over 15360.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.09029, over 3790801.77 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:47:48,952 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 13:47:50,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4826280.0, ans=0.125 2024-08-20 13:48:12,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-20 13:48:15,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.337e+01 2.560e+01 2.840e+01 5.771e+01, threshold=5.121e+01, percent-clipped=2.0 2024-08-20 13:48:48,430 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-20 13:48:58,720 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 13:49:17,569 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8500, loss[loss=0.1021, beats_loss=0.01153, ecapa_loss=0.0001248, whisper_loss=0.08928, over 22715.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001401, whisper_loss=0.08964, over 3813334.87 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:49:43,720 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 13:50:06,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-20 13:50:12,765 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 13:50:46,427 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8550, loss[loss=0.09377, beats_loss=0.01045, ecapa_loss=0.0001297, whisper_loss=0.08202, over 20515.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09003, over 3815670.69 frames. ], batch size: 82, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:50:54,849 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2024-08-20 13:51:01,873 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 13 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 13:51:11,100 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4827380.0, ans=0.125 2024-08-20 13:51:20,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.327e+01 2.509e+01 2.689e+01 1.250e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 13:51:30,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4827480.0, ans=0.125 2024-08-20 13:51:33,875 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 34 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 13:51:38,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4827480.0, ans=0.07 2024-08-20 13:51:45,058 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 13:52:03,769 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 13:52:06,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4827680.0, ans=0.125 2024-08-20 13:52:14,556 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-20 13:52:18,414 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8600, loss[loss=0.1096, beats_loss=0.01143, ecapa_loss=0.0001282, whisper_loss=0.09684, over 21092.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09108, over 3818538.23 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:52:19,601 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.953e-01 2024-08-20 13:52:29,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-20 13:52:30,850 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 13:53:17,634 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-20 13:53:19,300 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 13:53:26,455 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.396e+01 2024-08-20 13:53:47,619 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8650, loss[loss=0.1114, beats_loss=0.008167, ecapa_loss=0.0001651, whisper_loss=0.1016, over 23371.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01034, ecapa_loss=0.0001399, whisper_loss=0.09139, over 3870796.91 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:54:19,004 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 13:54:21,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.355e+01 2.590e+01 2.852e+01 2.640e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-20 13:54:28,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4828480.0, ans=0.125 2024-08-20 13:54:35,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4828480.0, ans=0.0 2024-08-20 13:54:38,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4828480.0, ans=0.125 2024-08-20 13:54:46,661 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:54:50,034 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 13:54:54,552 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 13:55:04,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4828680.0, ans=0.1 2024-08-20 13:55:09,818 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:55:11,048 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 13:55:20,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4828780.0, ans=0.125 2024-08-20 13:55:21,895 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8700, loss[loss=0.1005, beats_loss=0.01103, ecapa_loss=0.0001587, whisper_loss=0.08785, over 22120.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09037, over 3856535.83 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:55:27,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4828780.0, ans=0.0 2024-08-20 13:55:49,205 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 13:56:09,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-08-20 13:56:11,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4828980.0, ans=0.1 2024-08-20 13:56:19,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4829080.0, ans=0.1 2024-08-20 13:56:53,949 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8750, loss[loss=0.1068, beats_loss=0.01018, ecapa_loss=0.0001464, whisper_loss=0.09519, over 19527.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001386, whisper_loss=0.09097, over 3818356.99 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:57:14,012 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 13:57:22,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4829380.0, ans=0.0 2024-08-20 13:57:29,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.524e+01 2.778e+01 9.782e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 13:57:52,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4829580.0, ans=0.125 2024-08-20 13:57:55,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4829580.0, ans=0.0 2024-08-20 13:57:58,538 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 13:57:59,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4829580.0, ans=0.125 2024-08-20 13:58:24,235 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8800, loss[loss=0.0704, beats_loss=0.009637, ecapa_loss=0.0001853, whisper_loss=0.05891, over 14591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001379, whisper_loss=0.09043, over 3796668.67 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:58:31,609 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 13:58:32,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.40 vs. limit=22.5 2024-08-20 13:58:38,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4829780.0, ans=10.0 2024-08-20 13:58:52,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2024-08-20 13:59:10,880 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.24 vs. limit=15.0 2024-08-20 13:59:55,620 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8850, loss[loss=0.1182, beats_loss=0.008008, ecapa_loss=0.0001709, whisper_loss=0.1085, over 19010.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.09034, over 3754571.53 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:00:30,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.235e+01 2.490e+01 2.757e+01 4.655e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-20 14:01:02,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4830580.0, ans=0.0 2024-08-20 14:01:17,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4830680.0, ans=0.125 2024-08-20 14:01:25,712 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 14:01:29,252 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8900, loss[loss=0.09291, beats_loss=0.01114, ecapa_loss=0.0001338, whisper_loss=0.08043, over 23387.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001382, whisper_loss=0.09033, over 3762525.19 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:01:32,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4830780.0, ans=0.035 2024-08-20 14:01:34,372 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-20 14:01:43,780 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-20 14:02:13,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4830980.0, ans=0.125 2024-08-20 14:02:14,041 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-20 14:02:40,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4831180.0, ans=0.125 2024-08-20 14:02:48,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4831180.0, ans=0.1 2024-08-20 14:02:59,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 8950, loss[loss=0.09306, beats_loss=0.01214, ecapa_loss=0.0001259, whisper_loss=0.07966, over 22527.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.09054, over 3778283.86 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:03:30,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-20 14:03:30,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.318e+01 2.492e+01 2.834e+01 3.721e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 14:03:33,672 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 14:03:44,219 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 14:03:47,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4831480.0, ans=0.1 2024-08-20 14:03:55,744 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 14:04:05,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4831580.0, ans=0.125 2024-08-20 14:04:13,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4831680.0, ans=0.125 2024-08-20 14:04:19,456 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2024-08-20 14:04:20,674 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 14:04:24,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.27 vs. limit=6.0 2024-08-20 14:04:26,527 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9000, loss[loss=0.09882, beats_loss=0.01219, ecapa_loss=0.0001205, whisper_loss=0.08542, over 23184.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001392, whisper_loss=0.09044, over 3813879.66 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:04:26,528 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 14:05:10,519 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005032, whisper_loss=0.2493, over 931116.00 frames. 2024-08-20 14:05:34,770 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on SV_voxceleb1: loss=0.003984, beats_loss=0, ecapa_loss=0.0003984, whisper_loss=0, over 944235.00 frames. 2024-08-20 14:07:00,420 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0057, 1.7791, 1.9080, 1.3361, 1.5615, 2.1045, 2.5461, 1.6662], device='cuda:2') 2024-08-20 14:07:37,372 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 14:07:37,376 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 14:07:38,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-20 14:07:55,304 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-20 14:07:58,505 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4831880.0, ans=0.125 2024-08-20 14:07:58,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4831880.0, ans=0.0 2024-08-20 14:08:05,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4831880.0, ans=0.2 2024-08-20 14:08:14,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4831980.0, ans=0.05 2024-08-20 14:08:41,262 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 14:08:45,440 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=15.0 2024-08-20 14:09:00,245 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9050, loss[loss=0.09128, beats_loss=0.01118, ecapa_loss=0.00016, whisper_loss=0.0785, over 21872.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.000139, whisper_loss=0.08961, over 3807605.42 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:09:05,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4832280.0, ans=0.1 2024-08-20 14:09:11,567 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 14:09:15,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4832380.0, ans=0.125 2024-08-20 14:09:16,573 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 14:09:23,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4832380.0, ans=0.125 2024-08-20 14:09:28,131 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 18 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 14:09:29,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.212e+01 2.355e+01 2.625e+01 3.620e+01, threshold=4.711e+01, percent-clipped=0.0 2024-08-20 14:10:16,504 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 14:10:23,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4832780.0, ans=0.0 2024-08-20 14:10:24,909 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9100, loss[loss=0.07372, beats_loss=0.01251, ecapa_loss=9.674e-05, whisper_loss=0.06023, over 18253.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08894, over 3749777.16 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:11:38,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4833180.0, ans=0.125 2024-08-20 14:11:44,009 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-20 14:11:52,711 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9150, loss[loss=0.09466, beats_loss=0.009541, ecapa_loss=0.0001295, whisper_loss=0.08382, over 14044.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08978, over 3792739.78 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:11:55,969 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4833280.0, ans=0.125 2024-08-20 14:12:13,516 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-08-20 14:12:15,849 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 14:12:22,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.265e+01 2.494e+01 2.821e+01 1.323e+02, threshold=4.988e+01, percent-clipped=2.0 2024-08-20 14:12:29,937 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 14:12:45,197 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 12 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 14:12:51,206 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 14:13:06,715 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 14:13:19,926 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9200, loss[loss=0.1133, beats_loss=0.009678, ecapa_loss=0.0001332, whisper_loss=0.1023, over 13768.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08935, over 3764025.85 frames. ], batch size: 53, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:13:29,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2024-08-20 14:13:43,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4833880.0, ans=0.0 2024-08-20 14:13:48,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4833880.0, ans=0.2 2024-08-20 14:13:56,252 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4833980.0, ans=0.0 2024-08-20 14:14:24,477 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2024-08-20 14:14:25,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4834080.0, ans=0.125 2024-08-20 14:14:46,043 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9250, loss[loss=0.1179, beats_loss=0.01068, ecapa_loss=0.0001259, whisper_loss=0.106, over 23770.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08948, over 3762577.49 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:15:12,073 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 34 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 14:15:16,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.272e+01 2.596e+01 3.076e+01 4.662e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-20 14:15:18,188 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4834380.0, ans=0.125 2024-08-20 14:15:21,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4834480.0, ans=0.125 2024-08-20 14:15:25,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4834480.0, ans=0.1 2024-08-20 14:15:32,899 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 14:15:35,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4834480.0, ans=0.125 2024-08-20 14:15:52,834 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 14:16:12,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4834780.0, ans=0.125 2024-08-20 14:16:13,256 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9300, loss[loss=0.1165, beats_loss=0.01103, ecapa_loss=0.0001017, whisper_loss=0.1044, over 19945.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001388, whisper_loss=0.08966, over 3751008.74 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:16:23,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4834780.0, ans=0.125 2024-08-20 14:16:24,758 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 14:16:40,586 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 14:16:55,914 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-20 14:16:56,805 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 14:17:04,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4834980.0, ans=0.0 2024-08-20 14:17:09,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4835080.0, ans=0.125 2024-08-20 14:17:10,306 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 14:17:26,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4835180.0, ans=0.125 2024-08-20 14:17:34,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.96 vs. limit=22.5 2024-08-20 14:17:38,965 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 14:17:44,341 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9350, loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001352, whisper_loss=0.08905, over 14127.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001389, whisper_loss=0.08999, over 3811602.57 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:17:47,741 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-20 14:18:17,256 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.510e+01 2.725e+01 8.699e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 14:18:18,248 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4835380.0, ans=0.1 2024-08-20 14:18:22,418 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-20 14:18:27,085 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 14:18:42,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-20 14:18:43,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4835580.0, ans=0.0 2024-08-20 14:19:16,743 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9400, loss[loss=0.07478, beats_loss=0.01219, ecapa_loss=0.000142, whisper_loss=0.06117, over 13657.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001387, whisper_loss=0.08937, over 3802830.79 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:19:19,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4835780.0, ans=0.09899494936611666 2024-08-20 14:19:21,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4835780.0, ans=0.0 2024-08-20 14:19:24,853 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 14:19:28,526 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 14:19:41,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4835880.0, ans=0.125 2024-08-20 14:19:47,251 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 14:19:57,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.20 vs. limit=22.5 2024-08-20 14:20:10,863 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:20:17,426 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.548e-03 2024-08-20 14:20:22,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4836080.0, ans=0.0 2024-08-20 14:20:43,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2024-08-20 14:20:47,417 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9450, loss[loss=0.1088, beats_loss=0.009542, ecapa_loss=0.0001298, whisper_loss=0.09796, over 23489.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001389, whisper_loss=0.08934, over 3835658.01 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:21:01,698 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4836280.0, ans=0.125 2024-08-20 14:21:20,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.301e+01 2.582e+01 2.889e+01 4.439e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-20 14:21:34,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4836480.0, ans=0.1 2024-08-20 14:21:55,919 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 14:21:57,821 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 14:21:59,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4836680.0, ans=0.04949747468305833 2024-08-20 14:22:03,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4836680.0, ans=0.125 2024-08-20 14:22:21,458 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9500, loss[loss=0.1045, beats_loss=0.009877, ecapa_loss=0.0001319, whisper_loss=0.09327, over 22112.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001384, whisper_loss=0.0894, over 3825047.06 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:22:39,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4836880.0, ans=0.0 2024-08-20 14:22:47,801 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 14:22:58,186 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 14:23:00,969 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-20 14:23:12,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4837080.0, ans=0.2 2024-08-20 14:23:14,054 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 14:23:30,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4837180.0, ans=0.0 2024-08-20 14:23:49,518 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9550, loss[loss=0.1183, beats_loss=0.007495, ecapa_loss=0.0001508, whisper_loss=0.1092, over 18143.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001389, whisper_loss=0.09041, over 3795058.89 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:23:54,938 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 14:23:58,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4837280.0, ans=0.125 2024-08-20 14:24:00,719 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-20 14:24:11,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4837380.0, ans=0.1 2024-08-20 14:24:21,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.249e+01 2.469e+01 2.805e+01 3.890e+01, threshold=4.937e+01, percent-clipped=0.0 2024-08-20 14:24:57,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-08-20 14:25:04,929 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 14:25:05,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4837680.0, ans=0.125 2024-08-20 14:25:16,232 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 14:25:19,324 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9600, loss[loss=0.09051, beats_loss=0.009132, ecapa_loss=0.0001241, whisper_loss=0.08013, over 15248.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.08938, over 3779962.69 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:25:47,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4837880.0, ans=0.0 2024-08-20 14:25:56,753 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 14:26:01,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4837980.0, ans=0.125 2024-08-20 14:26:28,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4838180.0, ans=0.2 2024-08-20 14:26:45,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4838180.0, ans=0.1 2024-08-20 14:26:48,012 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9650, loss[loss=0.06831, beats_loss=0.01267, ecapa_loss=0.0001467, whisper_loss=0.05417, over 13058.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01055, ecapa_loss=0.0001391, whisper_loss=0.0886, over 3786619.76 frames. ], batch size: 53, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:26:49,777 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 14:26:51,000 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-20 14:27:07,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4838380.0, ans=0.125 2024-08-20 14:27:20,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.272e+01 2.458e+01 2.754e+01 3.251e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-20 14:27:23,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4838480.0, ans=0.125 2024-08-20 14:27:44,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4838580.0, ans=0.0 2024-08-20 14:27:53,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4838580.0, ans=0.125 2024-08-20 14:28:07,397 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 14:28:21,304 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9700, loss[loss=0.1138, beats_loss=0.009781, ecapa_loss=0.0001283, whisper_loss=0.1027, over 23095.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01054, ecapa_loss=0.0001403, whisper_loss=0.08875, over 3783082.38 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:28:23,911 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:28:25,316 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 14:28:29,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4838780.0, ans=0.2 2024-08-20 14:28:49,604 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-08-20 14:28:54,478 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:28:56,321 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 14:29:00,005 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 14:29:00,510 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-20 14:29:12,869 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 27 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 14:29:50,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4839180.0, ans=0.125 2024-08-20 14:29:50,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4839180.0, ans=0.125 2024-08-20 14:29:55,254 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9750, loss[loss=0.1244, beats_loss=0.0104, ecapa_loss=0.0001256, whisper_loss=0.1128, over 20023.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0106, ecapa_loss=0.0001388, whisper_loss=0.08908, over 3827311.21 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:30:00,502 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 14:30:00,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4839280.0, ans=0.125 2024-08-20 14:30:10,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=4839280.0, ans=15.0 2024-08-20 14:30:27,346 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 14:30:29,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4839380.0, ans=0.1 2024-08-20 14:30:30,214 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.577e+01 2.928e+01 5.580e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:30:48,128 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 14:30:55,363 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4839580.0, ans=0.0 2024-08-20 14:31:28,882 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9800, loss[loss=0.112, beats_loss=0.009752, ecapa_loss=0.000136, whisper_loss=0.1009, over 16468.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.08947, over 3808130.48 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:32:02,925 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 14:32:03,239 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4839880.0, ans=0.125 2024-08-20 14:32:17,936 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4839980.0, ans=0.2 2024-08-20 14:32:25,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4839980.0, ans=10.0 2024-08-20 14:33:06,948 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9850, loss[loss=0.1141, beats_loss=0.008379, ecapa_loss=0.0001504, whisper_loss=0.1042, over 17653.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001398, whisper_loss=0.08896, over 3763570.03 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:33:14,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4840280.0, ans=0.125 2024-08-20 14:33:42,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.298e+01 2.480e+01 2.698e+01 3.610e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 14:33:44,487 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4840380.0, ans=0.125 2024-08-20 14:33:50,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4840480.0, ans=0.2 2024-08-20 14:33:52,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4840480.0, ans=0.125 2024-08-20 14:34:01,733 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 14:34:07,506 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 14:34:27,226 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4840680.0, ans=0.2 2024-08-20 14:34:35,765 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-20 14:34:37,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-20 14:34:46,789 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9900, loss[loss=0.09858, beats_loss=0.008146, ecapa_loss=0.0001457, whisper_loss=0.08898, over 14693.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001387, whisper_loss=0.08947, over 3748331.41 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:35:13,909 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 14:35:28,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4840980.0, ans=0.125 2024-08-20 14:36:02,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4841180.0, ans=0.125 2024-08-20 14:36:04,465 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 14:36:11,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4841180.0, ans=0.125 2024-08-20 14:36:25,363 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 9950, loss[loss=0.1188, beats_loss=0.007832, ecapa_loss=0.0001359, whisper_loss=0.1096, over 16233.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001383, whisper_loss=0.08946, over 3753183.61 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:36:26,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-20 14:36:35,572 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 14:36:59,339 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.238e+01 2.460e+01 2.685e+01 1.158e+02, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 14:37:22,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-20 14:37:34,490 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4841680.0, ans=0.125 2024-08-20 14:37:48,051 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-20 14:37:51,717 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10000, loss[loss=0.1039, beats_loss=0.01143, ecapa_loss=0.0001173, whisper_loss=0.09134, over 13517.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001392, whisper_loss=0.08943, over 3742495.16 frames. ], batch size: 53, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:37:56,083 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4841780.0, ans=0.125 2024-08-20 14:38:01,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4841780.0, ans=0.1 2024-08-20 14:38:24,249 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-20 14:38:29,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4841980.0, ans=0.0 2024-08-20 14:38:30,031 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 14:38:31,802 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 14:38:52,643 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 14:39:13,405 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-20 14:39:33,012 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10050, loss[loss=0.1394, beats_loss=0.007638, ecapa_loss=0.0001225, whisper_loss=0.1305, over 21064.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001388, whisper_loss=0.09015, over 3788846.07 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:39:34,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4842280.0, ans=0.125 2024-08-20 14:39:36,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4842280.0, ans=0.125 2024-08-20 14:39:47,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4842280.0, ans=0.125 2024-08-20 14:39:48,775 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 16 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 14:39:57,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4842380.0, ans=0.0 2024-08-20 14:40:04,676 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4842380.0, ans=0.125 2024-08-20 14:40:18,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.420e+01 2.635e+01 2.918e+01 2.672e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-20 14:41:09,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4842680.0, ans=0.1 2024-08-20 14:41:09,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4842680.0, ans=0.0 2024-08-20 14:41:10,973 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-20 14:41:27,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4842680.0, ans=0.0 2024-08-20 14:41:33,066 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10100, loss[loss=0.08703, beats_loss=0.01182, ecapa_loss=0.0001567, whisper_loss=0.07365, over 14293.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.09017, over 3775668.54 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:41:39,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4842780.0, ans=0.1 2024-08-20 14:41:40,982 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 14:41:47,858 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2024-08-20 14:41:50,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4842780.0, ans=0.125 2024-08-20 14:42:19,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4842880.0, ans=0.125 2024-08-20 14:42:30,213 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-20 14:43:07,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4843180.0, ans=0.1 2024-08-20 14:43:14,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4843180.0, ans=0.1 2024-08-20 14:43:21,735 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 14:43:28,197 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10150, loss[loss=0.09513, beats_loss=0.01272, ecapa_loss=0.000129, whisper_loss=0.08112, over 22260.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08927, over 3758382.19 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:43:40,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4843280.0, ans=0.125 2024-08-20 14:43:44,238 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4843280.0, ans=0.04949747468305833 2024-08-20 14:43:57,428 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 14:44:02,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2024-08-20 14:44:03,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.366e+01 2.588e+01 2.870e+01 1.184e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-20 14:44:07,558 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-20 14:44:21,540 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 14:44:32,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4843580.0, ans=0.125 2024-08-20 14:44:57,169 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10200, loss[loss=0.1177, beats_loss=0.009536, ecapa_loss=0.0001534, whisper_loss=0.1066, over 20632.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08924, over 3757836.91 frames. ], batch size: 82, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:45:25,925 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 14:45:26,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4843880.0, ans=0.125 2024-08-20 14:45:45,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2024-08-20 14:45:50,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4844080.0, ans=0.09899494936611666 2024-08-20 14:46:05,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4844080.0, ans=0.125 2024-08-20 14:46:27,324 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10250, loss[loss=0.09858, beats_loss=0.008684, ecapa_loss=0.0001586, whisper_loss=0.08831, over 17265.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.08905, over 3795930.53 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:46:29,402 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 14:46:45,221 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 14:47:03,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.274e+01 2.551e+01 2.893e+01 4.019e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-20 14:48:02,608 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10300, loss[loss=0.07813, beats_loss=0.01249, ecapa_loss=0.000121, whisper_loss=0.06443, over 22561.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001406, whisper_loss=0.08898, over 3815147.70 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:49:13,587 WARNING [optim.py:496] (2/4) Scaling gradients by 0.02814776450395584, model_norm_threshold=51.010257720947266 2024-08-20 14:49:13,753 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.746e+05, grad_sumsq=8.746e+05, orig_rms_sq=1.000e+00 2024-08-20 14:49:23,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4845080.0, ans=0.125 2024-08-20 14:49:24,848 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-20 14:49:37,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4845180.0, ans=0.1 2024-08-20 14:49:41,735 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 14:49:44,421 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 14:49:49,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4845280.0, ans=0.125 2024-08-20 14:49:50,236 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10350, loss[loss=0.09151, beats_loss=0.01168, ecapa_loss=0.0001315, whisper_loss=0.07852, over 22225.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001421, whisper_loss=0.08935, over 3853436.65 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:50:03,637 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 14:50:19,671 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4845380.0, ans=0.125 2024-08-20 14:50:37,384 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.346e+01 2.546e+01 2.818e+01 1.812e+03, threshold=5.092e+01, percent-clipped=2.0 2024-08-20 14:50:37,674 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 14:50:39,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4845480.0, ans=0.125 2024-08-20 14:50:40,157 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 14:50:44,462 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4845480.0, ans=0.04949747468305833 2024-08-20 14:50:54,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4845480.0, ans=0.95 2024-08-20 14:51:03,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4845580.0, ans=0.0 2024-08-20 14:51:04,150 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 14:51:09,571 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 14:51:16,477 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 14:51:19,375 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 14:51:31,970 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 14:51:41,209 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-20 14:51:54,350 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10400, loss[loss=0.08382, beats_loss=0.01086, ecapa_loss=0.0001581, whisper_loss=0.07138, over 14043.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001412, whisper_loss=0.08884, over 3841185.59 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:52:18,621 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-20 14:52:41,581 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-20 14:53:03,918 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 18 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 14:53:08,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4846080.0, ans=0.125 2024-08-20 14:53:17,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4846080.0, ans=0.95 2024-08-20 14:53:42,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4846180.0, ans=0.05 2024-08-20 14:53:49,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4846180.0, ans=0.1 2024-08-20 14:53:56,086 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10450, loss[loss=0.1355, beats_loss=0.007422, ecapa_loss=0.0001433, whisper_loss=0.1267, over 24658.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.08928, over 3839619.57 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:53:56,295 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 14:54:01,586 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 14:54:41,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4846480.0, ans=0.125 2024-08-20 14:54:42,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.294e+01 2.565e+01 2.796e+01 8.122e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-20 14:54:54,784 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=15.0 2024-08-20 14:54:56,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4846480.0, ans=0.125 2024-08-20 14:55:10,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4846580.0, ans=0.1 2024-08-20 14:55:38,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4846680.0, ans=0.125 2024-08-20 14:55:54,031 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10500, loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001382, whisper_loss=0.09073, over 22656.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.0001406, whisper_loss=0.08853, over 3795070.83 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:56:07,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=12.0 2024-08-20 14:56:19,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4846880.0, ans=0.125 2024-08-20 14:56:33,631 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=12.0 2024-08-20 14:56:44,107 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 16 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 14:56:52,493 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-08-20 14:57:07,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4847080.0, ans=0.0 2024-08-20 14:57:17,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4847080.0, ans=0.2 2024-08-20 14:57:37,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4847180.0, ans=0.1 2024-08-20 14:57:50,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10550, loss[loss=0.1004, beats_loss=0.009579, ecapa_loss=0.0001549, whisper_loss=0.08929, over 22486.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001403, whisper_loss=0.08922, over 3793199.90 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:57:55,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4847280.0, ans=0.125 2024-08-20 14:57:58,314 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 14:58:17,047 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.360e+00 2024-08-20 14:58:19,439 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4847380.0, ans=0.125 2024-08-20 14:58:23,536 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4847380.0, ans=0.0 2024-08-20 14:58:31,714 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 14:58:35,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.331e+01 2.577e+01 2.959e+01 5.357e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:58:40,257 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 14:58:45,270 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 12 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 14:58:52,482 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2024-08-20 14:58:53,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4847480.0, ans=0.125 2024-08-20 14:59:09,801 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 14:59:11,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=12.0 2024-08-20 14:59:30,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4847680.0, ans=0.125 2024-08-20 14:59:45,943 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10600, loss[loss=0.09512, beats_loss=0.01088, ecapa_loss=0.0001653, whisper_loss=0.08258, over 14525.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01027, ecapa_loss=0.0001409, whisper_loss=0.08965, over 3779498.82 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:59:56,347 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.546e+01 2024-08-20 14:59:59,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-20 15:00:31,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4847980.0, ans=0.125 2024-08-20 15:01:47,017 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10650, loss[loss=0.1057, beats_loss=0.009723, ecapa_loss=0.0001465, whisper_loss=0.09454, over 22669.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01027, ecapa_loss=0.0001392, whisper_loss=0.08991, over 3763574.62 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:02:03,193 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 15:02:38,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.292e+01 2.599e+01 2.866e+01 5.790e+01, threshold=5.197e+01, percent-clipped=1.0 2024-08-20 15:03:23,463 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 15:03:54,787 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10700, loss[loss=0.0915, beats_loss=0.01154, ecapa_loss=0.0001639, whisper_loss=0.07832, over 20752.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001388, whisper_loss=0.09016, over 3773522.39 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:04:15,840 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-20 15:04:23,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2024-08-20 15:04:24,378 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 15:04:34,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4848880.0, ans=0.125 2024-08-20 15:04:37,821 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 15:04:51,020 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 35 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 15:05:04,530 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 15:06:00,157 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10750, loss[loss=0.109, beats_loss=0.009362, ecapa_loss=0.0001333, whisper_loss=0.09828, over 19803.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08997, over 3788210.82 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:06:01,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4849280.0, ans=0.05 2024-08-20 15:06:27,966 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 15:06:37,251 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-20 15:06:42,730 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 15:06:49,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.343e+01 2.598e+01 3.019e+01 8.630e+01, threshold=5.195e+01, percent-clipped=2.0 2024-08-20 15:06:52,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4849480.0, ans=0.125 2024-08-20 15:06:59,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4849480.0, ans=0.125 2024-08-20 15:07:01,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4849480.0, ans=0.125 2024-08-20 15:07:02,274 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 15:07:03,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-08-20 15:07:55,351 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 33 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 15:07:55,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4849780.0, ans=0.125 2024-08-20 15:07:57,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10800, loss[loss=0.1111, beats_loss=0.01065, ecapa_loss=0.0001336, whisper_loss=0.09913, over 23460.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001385, whisper_loss=0.08969, over 3792694.59 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:08:13,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4849780.0, ans=0.125 2024-08-20 15:08:13,610 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-20 15:08:42,813 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 15:09:02,454 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 15:09:26,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4850080.0, ans=0.0 2024-08-20 15:09:30,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4850180.0, ans=0.1 2024-08-20 15:09:31,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4850180.0, ans=0.125 2024-08-20 15:09:38,848 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 15:09:53,684 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10850, loss[loss=0.1318, beats_loss=0.007317, ecapa_loss=0.0001196, whisper_loss=0.1233, over 15781.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001373, whisper_loss=0.09034, over 3805725.70 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:10:15,896 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 15:10:27,716 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4850380.0, ans=0.125 2024-08-20 15:10:30,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4850380.0, ans=0.2 2024-08-20 15:10:42,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.250e+01 2.553e+01 2.783e+01 2.694e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-20 15:11:04,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4850480.0, ans=0.125 2024-08-20 15:11:35,230 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 15:11:57,393 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10900, loss[loss=0.09446, beats_loss=0.0116, ecapa_loss=0.0001358, whisper_loss=0.0815, over 19063.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001376, whisper_loss=0.09043, over 3814243.86 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:12:30,905 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4850880.0, ans=0.1 2024-08-20 15:12:31,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-20 15:13:07,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4851080.0, ans=0.125 2024-08-20 15:13:07,694 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-20 15:13:16,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4851080.0, ans=0.125 2024-08-20 15:13:32,186 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 15:13:33,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4851180.0, ans=0.125 2024-08-20 15:13:51,535 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 15:13:54,339 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 10950, loss[loss=0.108, beats_loss=0.008538, ecapa_loss=0.0001287, whisper_loss=0.09818, over 15664.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001369, whisper_loss=0.09054, over 3802931.47 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:14:30,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4851380.0, ans=0.125 2024-08-20 15:14:31,254 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 15:14:34,351 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-20 15:14:40,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.141e+01 2.427e+01 2.900e+01 4.434e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 15:14:57,975 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 15:14:59,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-20 15:15:13,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4851580.0, ans=0.2 2024-08-20 15:15:38,703 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06528465449810028, model_norm_threshold=48.54976272583008 2024-08-20 15:15:38,870 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.33, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.799e+05, orig_rms_sq=1.000e+00 2024-08-20 15:15:53,481 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11000, loss[loss=0.1058, beats_loss=0.01137, ecapa_loss=0.0001467, whisper_loss=0.09294, over 21582.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001379, whisper_loss=0.09104, over 3830896.79 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:15:57,114 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 15:16:47,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4851980.0, ans=0.2 2024-08-20 15:17:03,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4852080.0, ans=0.0 2024-08-20 15:17:14,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4852080.0, ans=0.1 2024-08-20 15:17:17,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4852080.0, ans=0.125 2024-08-20 15:17:26,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=12.0 2024-08-20 15:17:28,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4852180.0, ans=0.0 2024-08-20 15:17:47,112 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11050, loss[loss=0.08349, beats_loss=0.01102, ecapa_loss=0.0001411, whisper_loss=0.07105, over 17152.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001384, whisper_loss=0.09124, over 3861956.96 frames. ], batch size: 72, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:18:10,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4852380.0, ans=0.125 2024-08-20 15:18:22,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4852380.0, ans=0.125 2024-08-20 15:18:35,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.285e+01 2.516e+01 2.757e+01 7.437e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 15:18:43,655 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 15:19:01,929 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-08-20 15:19:10,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4852580.0, ans=0.125 2024-08-20 15:19:27,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4852680.0, ans=0.1 2024-08-20 15:19:36,944 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2024-08-20 15:19:46,487 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11100, loss[loss=0.06333, beats_loss=0.01468, ecapa_loss=8.654e-05, whisper_loss=0.04779, over 15378.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001375, whisper_loss=0.09116, over 3860088.28 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:19:50,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4852780.0, ans=0.0 2024-08-20 15:20:26,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4852880.0, ans=0.125 2024-08-20 15:21:02,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4853080.0, ans=0.07 2024-08-20 15:21:09,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4853080.0, ans=0.0 2024-08-20 15:21:17,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4853080.0, ans=0.2 2024-08-20 15:21:31,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4853180.0, ans=0.125 2024-08-20 15:21:34,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4853180.0, ans=0.1 2024-08-20 15:21:44,380 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11150, loss[loss=0.1087, beats_loss=0.009891, ecapa_loss=0.0001315, whisper_loss=0.09745, over 20532.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.0001383, whisper_loss=0.09136, over 3883657.19 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:21:53,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4853280.0, ans=0.0 2024-08-20 15:22:01,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4853280.0, ans=0.07 2024-08-20 15:22:08,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4853380.0, ans=0.125 2024-08-20 15:22:23,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4853380.0, ans=0.1 2024-08-20 15:22:30,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.306e+01 2.527e+01 2.770e+01 3.887e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-20 15:22:36,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2024-08-20 15:22:41,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4853480.0, ans=0.125 2024-08-20 15:22:49,079 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4853480.0, ans=0.0 2024-08-20 15:22:51,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4853480.0, ans=0.125 2024-08-20 15:23:18,110 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 15:23:38,869 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 12 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 15:23:44,537 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 15:23:46,554 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11200, loss[loss=0.12, beats_loss=0.00973, ecapa_loss=0.0001527, whisper_loss=0.1088, over 22104.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001377, whisper_loss=0.09107, over 3899311.51 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:23:49,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4853780.0, ans=0.125 2024-08-20 15:23:55,948 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 15:24:09,065 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 15:24:19,693 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 25 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 15:24:25,303 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 30 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 15:24:37,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4853980.0, ans=0.125 2024-08-20 15:24:53,955 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 30 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-20 15:25:29,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4854180.0, ans=0.0 2024-08-20 15:25:32,126 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-20 15:25:32,778 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 15:25:34,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4854180.0, ans=0.1 2024-08-20 15:25:42,061 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-20 15:25:42,797 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 15:25:56,977 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11250, loss[loss=0.1071, beats_loss=0.01009, ecapa_loss=0.0001481, whisper_loss=0.09551, over 22675.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.09115, over 3938331.25 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:26:01,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4854280.0, ans=0.1 2024-08-20 15:26:09,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4854280.0, ans=0.1 2024-08-20 15:26:20,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4854380.0, ans=0.125 2024-08-20 15:26:45,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.343e+01 2.555e+01 2.874e+01 4.205e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 15:26:58,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4854480.0, ans=0.0 2024-08-20 15:27:05,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4854480.0, ans=0.1 2024-08-20 15:27:12,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4854580.0, ans=0.125 2024-08-20 15:27:24,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4854580.0, ans=0.125 2024-08-20 15:27:27,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4854580.0, ans=0.125 2024-08-20 15:27:30,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4854580.0, ans=0.125 2024-08-20 15:27:58,055 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11300, loss[loss=0.1027, beats_loss=0.01068, ecapa_loss=0.0001326, whisper_loss=0.09072, over 23274.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001377, whisper_loss=0.09096, over 3916896.72 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:28:00,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-08-20 15:28:02,862 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-20 15:28:30,494 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 15:28:40,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4854880.0, ans=0.1 2024-08-20 15:28:53,874 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 16 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 15:28:56,238 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 15:29:00,726 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 15:29:25,580 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 15:29:58,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4855280.0, ans=0.2 2024-08-20 15:29:59,445 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11350, loss[loss=0.09907, beats_loss=0.01063, ecapa_loss=0.000176, whisper_loss=0.08668, over 16913.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.09093, over 3892469.62 frames. ], batch size: 72, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:30:29,237 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 15:30:49,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.328e+01 2.511e+01 2.770e+01 2.674e+02, threshold=5.022e+01, percent-clipped=3.0 2024-08-20 15:31:21,538 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 15:32:03,236 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11400, loss[loss=0.1024, beats_loss=0.008707, ecapa_loss=0.0001272, whisper_loss=0.0924, over 16507.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01029, ecapa_loss=0.0001406, whisper_loss=0.09112, over 3870808.21 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:32:34,260 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 13 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 15:32:36,666 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 15:32:58,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4855980.0, ans=0.125 2024-08-20 15:33:02,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4855980.0, ans=0.125 2024-08-20 15:33:27,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4856080.0, ans=0.0 2024-08-20 15:33:39,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4856180.0, ans=0.0 2024-08-20 15:33:47,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4856180.0, ans=0.0 2024-08-20 15:34:02,583 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11450, loss[loss=0.09396, beats_loss=0.01115, ecapa_loss=0.0001499, whisper_loss=0.08132, over 21409.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.09111, over 3918042.28 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:34:12,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4856280.0, ans=0.0 2024-08-20 15:34:53,075 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.356e+01 2.634e+01 3.043e+01 3.885e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-20 15:34:56,113 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 15:35:01,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4856480.0, ans=0.0 2024-08-20 15:35:04,652 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 15:35:50,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4856680.0, ans=0.125 2024-08-20 15:36:01,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4856780.0, ans=0.125 2024-08-20 15:36:02,521 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11500, loss[loss=0.0982, beats_loss=0.01198, ecapa_loss=0.000132, whisper_loss=0.08491, over 23169.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001384, whisper_loss=0.09064, over 3883521.13 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:36:05,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4856780.0, ans=0.125 2024-08-20 15:36:18,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4856780.0, ans=0.0 2024-08-20 15:36:23,888 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 12 from LS+wenet, 31 from Vox, 18 fro AS 2024-08-20 15:36:27,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4856880.0, ans=0.0 2024-08-20 15:36:27,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4856880.0, ans=0.2 2024-08-20 15:36:39,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4856880.0, ans=0.125 2024-08-20 15:36:55,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4856980.0, ans=0.0 2024-08-20 15:36:59,624 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.31 vs. limit=10.0 2024-08-20 15:37:24,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4857080.0, ans=0.05 2024-08-20 15:37:33,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4857180.0, ans=0.2 2024-08-20 15:37:53,876 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11550, loss[loss=0.112, beats_loss=0.01088, ecapa_loss=9.501e-05, whisper_loss=0.1001, over 23276.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001385, whisper_loss=0.09057, over 3860420.56 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:38:12,510 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 15:38:16,156 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 15:38:22,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4857380.0, ans=0.07 2024-08-20 15:38:40,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.202e+01 2.508e+01 2.840e+01 4.143e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-20 15:38:54,657 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 17 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 15:38:56,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4857480.0, ans=0.2 2024-08-20 15:38:59,625 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 15:39:14,564 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 15:39:28,260 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 15:39:37,649 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 15:39:46,660 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-20 15:39:47,010 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11600, loss[loss=0.09587, beats_loss=0.009721, ecapa_loss=0.0001385, whisper_loss=0.08477, over 17620.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.000138, whisper_loss=0.09005, over 3863507.83 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:39:57,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4857780.0, ans=0.125 2024-08-20 15:40:18,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4857880.0, ans=0.125 2024-08-20 15:40:40,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-20 15:41:05,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4858080.0, ans=0.0 2024-08-20 15:41:10,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4858080.0, ans=0.1 2024-08-20 15:41:36,339 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11650, loss[loss=0.1096, beats_loss=0.00842, ecapa_loss=0.0001502, whisper_loss=0.09966, over 14925.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.0901, over 3795229.57 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:41:41,252 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 15:42:02,004 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 15:42:14,090 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4858380.0, ans=0.07 2024-08-20 15:42:24,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.221e+01 2.529e+01 2.912e+01 8.219e+01, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 15:42:25,223 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 15:42:46,886 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 15:42:48,718 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 15:43:18,094 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 15:43:29,818 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 15:43:33,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4858780.0, ans=0.125 2024-08-20 15:43:34,404 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11700, loss[loss=0.1028, beats_loss=0.0121, ecapa_loss=0.0001309, whisper_loss=0.08938, over 20789.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.000138, whisper_loss=0.08999, over 3798453.75 frames. ], batch size: 84, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:43:46,273 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 15:44:03,196 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-20 15:44:34,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4858980.0, ans=0.125 2024-08-20 15:45:27,530 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11750, loss[loss=0.08632, beats_loss=0.01041, ecapa_loss=0.0001719, whisper_loss=0.0742, over 20450.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.08987, over 3804381.45 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:45:38,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4859280.0, ans=0.0 2024-08-20 15:45:52,126 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 33 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 15:46:08,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4859380.0, ans=0.1 2024-08-20 15:46:08,087 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4859380.0, ans=0.125 2024-08-20 15:46:10,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-08-20 15:46:10,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.330e+01 2.512e+01 2.808e+01 3.989e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 15:46:12,636 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 15:46:14,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.93 vs. limit=10.0 2024-08-20 15:46:32,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4859580.0, ans=0.125 2024-08-20 15:46:36,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4859580.0, ans=0.0 2024-08-20 15:46:43,016 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4859580.0, ans=0.0 2024-08-20 15:46:47,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4859580.0, ans=0.125 2024-08-20 15:46:50,061 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03734064847230911, model_norm_threshold=50.2408561706543 2024-08-20 15:46:50,227 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.797e+05, grad_sumsq=2.797e+05, orig_rms_sq=1.000e+00 2024-08-20 15:46:52,383 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 15:46:53,986 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 15:46:54,963 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4859680.0, ans=0.0 2024-08-20 15:47:00,219 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 30 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 15:47:01,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4859680.0, ans=0.0 2024-08-20 15:47:14,975 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11800, loss[loss=0.08706, beats_loss=0.01145, ecapa_loss=0.0001438, whisper_loss=0.07417, over 12849.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001393, whisper_loss=0.09055, over 3820876.29 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:47:26,881 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4859780.0, ans=0.2 2024-08-20 15:47:30,883 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 15:47:43,299 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 15:47:51,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4859880.0, ans=0.0 2024-08-20 15:47:51,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4859880.0, ans=0.2 2024-08-20 15:48:28,287 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:48:38,812 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-08-20 15:48:45,548 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 15:48:58,014 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11850, loss[loss=0.1166, beats_loss=0.01005, ecapa_loss=0.0001277, whisper_loss=0.1052, over 23299.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001394, whisper_loss=0.09044, over 3856439.87 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:49:00,143 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 15:49:04,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4860280.0, ans=0.2 2024-08-20 15:49:04,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4860280.0, ans=0.0 2024-08-20 15:49:21,402 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 15:49:22,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-20 15:49:36,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.263e+01 2.480e+01 2.849e+01 1.345e+03, threshold=4.961e+01, percent-clipped=1.0 2024-08-20 15:49:37,058 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 15:50:28,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4860680.0, ans=0.1 2024-08-20 15:50:37,374 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 15:50:38,962 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11900, loss[loss=0.09225, beats_loss=0.01091, ecapa_loss=0.0001232, whisper_loss=0.08011, over 23206.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.09048, over 3838667.89 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:50:40,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4860780.0, ans=0.95 2024-08-20 15:50:43,787 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 15:50:45,985 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 15:50:56,862 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:51:07,589 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2024-08-20 15:51:27,244 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 15:52:22,948 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 11950, loss[loss=0.08802, beats_loss=0.01098, ecapa_loss=0.0001432, whisper_loss=0.07561, over 14904.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.09011, over 3825963.07 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:52:24,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:29,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4861280.0, ans=0.0 2024-08-20 15:52:31,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:34,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:37,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4861280.0, ans=0.0 2024-08-20 15:52:40,955 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 22 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-20 15:52:45,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2024-08-20 15:52:50,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4861380.0, ans=0.0 2024-08-20 15:52:53,603 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 15:53:00,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4861380.0, ans=0.0 2024-08-20 15:53:06,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.342e+01 2.523e+01 2.820e+01 2.544e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-20 15:53:06,255 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 15:53:19,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4861480.0, ans=0.025 2024-08-20 15:53:58,681 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 15:54:06,173 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 15:54:13,281 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12000, loss[loss=0.1044, beats_loss=0.009158, ecapa_loss=0.0001432, whisper_loss=0.09379, over 22776.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001384, whisper_loss=0.08945, over 3861719.06 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:54:13,282 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 15:54:27,386 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5258, 3.8030, 3.5463, 3.5364], device='cuda:2') 2024-08-20 15:54:48,532 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on ASR_libri: loss=0.2555, beats_loss=0, ecapa_loss=0.000501, whisper_loss=0.2505, over 931116.00 frames. 2024-08-20 15:55:14,160 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on SV_voxceleb1: loss=0.003892, beats_loss=0, ecapa_loss=0.0003892, whisper_loss=0, over 944235.00 frames. 2024-08-20 15:56:27,051 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8493, 2.3279, 1.9718, 1.6060, 1.8053, 1.6763, 2.0308, 2.0038], device='cuda:2') 2024-08-20 15:56:55,555 INFO [train_multi_KD3.py:1150] (2/4) Epoch 33, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 15:56:55,561 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 15:57:05,138 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 15:57:13,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4861880.0, ans=0.2 2024-08-20 15:57:23,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=8.0 2024-08-20 15:57:24,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4861880.0, ans=0.0 2024-08-20 15:57:33,992 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 15:57:34,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-20 15:57:40,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4861980.0, ans=0.0 2024-08-20 15:57:50,847 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 15:57:53,682 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-08-20 15:57:53,686 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-20 15:58:10,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4862180.0, ans=0.0 2024-08-20 15:58:19,397 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12050, loss[loss=0.09952, beats_loss=0.01231, ecapa_loss=0.0001421, whisper_loss=0.08579, over 21301.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001384, whisper_loss=0.08975, over 3852959.12 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:58:23,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4862280.0, ans=0.1 2024-08-20 15:58:28,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4862280.0, ans=0.0 2024-08-20 15:58:39,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4862380.0, ans=0.2 2024-08-20 15:58:53,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.378e+01 2.665e+01 2.948e+01 5.073e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-20 15:59:08,347 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 15:59:44,452 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12100, loss[loss=0.08625, beats_loss=0.01215, ecapa_loss=0.0001532, whisper_loss=0.07256, over 12413.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001392, whisper_loss=0.08982, over 3861001.81 frames. ], batch size: 50, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:59:51,366 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 16:00:00,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4862880.0, ans=0.015 2024-08-20 16:00:09,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-08-20 16:00:11,611 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 16:00:13,085 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-20 16:00:17,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4862980.0, ans=0.125 2024-08-20 16:00:43,744 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 28 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 16:00:48,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=15.0 2024-08-20 16:00:50,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4863180.0, ans=0.2 2024-08-20 16:01:07,084 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12150, loss[loss=0.1158, beats_loss=0.01054, ecapa_loss=0.0001482, whisper_loss=0.1038, over 22412.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.08991, over 3853647.08 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:01:17,266 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 16:01:19,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4863280.0, ans=0.0 2024-08-20 16:01:20,165 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 16:01:21,164 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-20 16:01:25,271 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 16:01:38,799 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 30 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 16:01:39,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.292e+01 2.549e+01 2.868e+01 6.331e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-20 16:01:45,194 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 16:01:47,057 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 16:01:51,836 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 16:01:53,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4863480.0, ans=0.1 2024-08-20 16:02:15,893 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 16:02:28,285 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12200, loss[loss=0.1103, beats_loss=0.0092, ecapa_loss=0.0001134, whisper_loss=0.09995, over 16541.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.09021, over 3861625.56 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:02:35,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4863780.0, ans=0.1 2024-08-20 16:02:38,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4863780.0, ans=0.125 2024-08-20 16:02:47,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-08-20 16:03:13,324 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2024-08-20 16:03:16,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4864080.0, ans=0.2 2024-08-20 16:03:18,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4864080.0, ans=0.0 2024-08-20 16:03:19,705 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 31 from Vox, 22 fro AS 2024-08-20 16:03:21,389 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4864080.0, ans=0.2 2024-08-20 16:03:38,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4864180.0, ans=0.1 2024-08-20 16:03:42,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-20 16:03:49,861 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12250, loss[loss=0.1117, beats_loss=0.01001, ecapa_loss=0.0001398, whisper_loss=0.1003, over 21098.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.09013, over 3878190.50 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:03:59,683 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 16:04:08,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4864380.0, ans=0.0 2024-08-20 16:04:21,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.269e+01 2.404e+01 2.750e+01 9.360e+01, threshold=4.808e+01, percent-clipped=1.0 2024-08-20 16:04:22,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4864480.0, ans=0.125 2024-08-20 16:04:34,137 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-08-20 16:04:44,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4864580.0, ans=0.2 2024-08-20 16:05:11,856 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12300, loss[loss=0.1276, beats_loss=0.008578, ecapa_loss=0.0001362, whisper_loss=0.1177, over 15798.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.09004, over 3872382.57 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:05:26,654 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 16:05:31,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4864880.0, ans=0.125 2024-08-20 16:05:36,380 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05332305282354355, model_norm_threshold=48.08091354370117 2024-08-20 16:05:36,547 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.333e+05, grad_sumsq=1.333e+05, orig_rms_sq=1.000e+00 2024-08-20 16:05:43,014 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 16:05:52,863 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 16:05:53,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4864980.0, ans=0.0 2024-08-20 16:05:59,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4865080.0, ans=0.1 2024-08-20 16:06:34,548 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12350, loss[loss=0.09115, beats_loss=0.01125, ecapa_loss=0.0001223, whisper_loss=0.07868, over 13004.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001392, whisper_loss=0.08915, over 3824706.92 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:06:56,842 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 16:07:08,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.318e+01 2.528e+01 2.855e+01 9.017e+02, threshold=5.055e+01, percent-clipped=1.0 2024-08-20 16:07:09,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-20 16:07:22,799 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 16:07:24,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4865580.0, ans=0.1 2024-08-20 16:07:30,067 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 14 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 16:07:40,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4865580.0, ans=0.2 2024-08-20 16:07:40,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-20 16:07:45,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4865680.0, ans=0.0 2024-08-20 16:07:52,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:07:57,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:08:00,598 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12400, loss[loss=0.09875, beats_loss=0.008431, ecapa_loss=0.0001587, whisper_loss=0.08874, over 15644.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.08901, over 3812189.31 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:08:22,464 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-20 16:08:28,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4865880.0, ans=0.125 2024-08-20 16:08:28,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4865880.0, ans=0.0 2024-08-20 16:08:45,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4865980.0, ans=15.0 2024-08-20 16:08:46,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4865980.0, ans=0.125 2024-08-20 16:08:58,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4866080.0, ans=0.2 2024-08-20 16:09:06,555 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-08-20 16:09:21,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-20 16:09:36,251 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 16:09:39,952 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12450, loss[loss=0.09415, beats_loss=0.01034, ecapa_loss=0.0001387, whisper_loss=0.08242, over 17111.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001385, whisper_loss=0.0886, over 3798927.91 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:09:40,201 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 16:09:44,742 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 16:09:50,929 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 18 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 16:09:52,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4866280.0, ans=0.125 2024-08-20 16:09:53,073 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 16:10:05,851 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 30 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 16:10:22,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.270e+01 2.513e+01 2.843e+01 4.408e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-20 16:10:35,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4866480.0, ans=0.1 2024-08-20 16:10:39,010 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 16:10:42,733 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 16:10:55,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4866580.0, ans=0.0 2024-08-20 16:11:01,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4866680.0, ans=0.125 2024-08-20 16:11:23,878 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12500, loss[loss=0.08774, beats_loss=0.01073, ecapa_loss=0.0001269, whisper_loss=0.07575, over 17922.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001386, whisper_loss=0.08916, over 3803171.83 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:11:43,119 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 21 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 16:11:51,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4866880.0, ans=0.125 2024-08-20 16:11:55,470 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-20 16:11:57,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4866880.0, ans=0.125 2024-08-20 16:12:02,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4866880.0, ans=0.125 2024-08-20 16:12:20,205 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4866980.0, ans=0.1 2024-08-20 16:12:34,438 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 38 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 16:12:50,735 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 16:12:56,432 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4867180.0, ans=0.2 2024-08-20 16:13:12,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4867180.0, ans=0.2 2024-08-20 16:13:15,593 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12550, loss[loss=0.1198, beats_loss=0.007009, ecapa_loss=0.0001262, whisper_loss=0.1116, over 17329.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.000139, whisper_loss=0.08954, over 3804659.06 frames. ], batch size: 65, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:13:43,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-20 16:13:51,292 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 16:14:02,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.459e+01 2.718e+01 3.101e+01 5.496e+01, threshold=5.435e+01, percent-clipped=1.0 2024-08-20 16:14:28,370 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-20 16:14:34,025 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 16:14:36,700 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 16:14:42,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4867580.0, ans=0.1 2024-08-20 16:14:52,373 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.38 vs. limit=10.0 2024-08-20 16:15:12,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4867780.0, ans=0.125 2024-08-20 16:15:13,166 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12600, loss[loss=0.1052, beats_loss=0.008687, ecapa_loss=0.0001116, whisper_loss=0.09542, over 14137.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.08916, over 3792852.34 frames. ], batch size: 51, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:15:27,095 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 16:15:38,934 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 16:15:52,736 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 25 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-20 16:15:53,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4867880.0, ans=0.125 2024-08-20 16:16:24,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-20 16:16:42,867 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 16:16:49,451 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 16:16:50,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4868180.0, ans=0.0 2024-08-20 16:17:05,906 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12650, loss[loss=0.1404, beats_loss=0.007797, ecapa_loss=0.0001229, whisper_loss=0.1314, over 16448.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.08945, over 3790671.20 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:17:14,684 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:17:20,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4868280.0, ans=0.2 2024-08-20 16:17:28,720 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 16:17:37,287 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.024e+05 2024-08-20 16:17:51,142 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.312e+01 2.541e+01 2.719e+01 3.789e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 16:17:59,088 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 16:18:01,561 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 16:18:19,973 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 16:18:26,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-20 16:18:30,511 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 16:18:57,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4868780.0, ans=0.05 2024-08-20 16:18:58,743 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12700, loss[loss=0.09107, beats_loss=0.009547, ecapa_loss=0.0001829, whisper_loss=0.07969, over 14083.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001387, whisper_loss=0.08935, over 3798034.99 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:19:07,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4868780.0, ans=0.125 2024-08-20 16:19:14,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-20 16:19:16,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4868780.0, ans=0.125 2024-08-20 16:19:21,118 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4868880.0, ans=0.2 2024-08-20 16:19:26,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4868880.0, ans=0.125 2024-08-20 16:19:34,035 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 21 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-20 16:19:49,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4868980.0, ans=6.0 2024-08-20 16:20:00,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4868980.0, ans=0.2 2024-08-20 16:20:29,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4869180.0, ans=0.125 2024-08-20 16:20:30,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-20 16:20:40,620 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 16:20:51,321 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12750, loss[loss=0.1269, beats_loss=0.009635, ecapa_loss=0.0001326, whisper_loss=0.116, over 23338.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01057, ecapa_loss=0.0001376, whisper_loss=0.08828, over 3779494.93 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:21:01,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4869280.0, ans=0.05 2024-08-20 16:21:10,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4869280.0, ans=0.0 2024-08-20 16:21:13,726 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 16:21:28,215 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 16:21:34,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4869480.0, ans=0.1 2024-08-20 16:21:35,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.357e+01 2.635e+01 3.039e+01 5.268e+01, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 16:21:38,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4869480.0, ans=0.0 2024-08-20 16:21:49,690 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 16:22:00,989 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4869580.0, ans=0.2 2024-08-20 16:22:10,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4869580.0, ans=0.125 2024-08-20 16:22:32,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4869680.0, ans=0.125 2024-08-20 16:22:36,635 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12800, loss[loss=0.08194, beats_loss=0.01181, ecapa_loss=0.0001116, whisper_loss=0.06901, over 14713.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.0001376, whisper_loss=0.08872, over 3815222.82 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:22:38,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4869780.0, ans=0.125 2024-08-20 16:23:26,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4869980.0, ans=0.125 2024-08-20 16:23:32,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4869980.0, ans=0.125 2024-08-20 16:23:41,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4869980.0, ans=0.125 2024-08-20 16:23:48,973 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 16:23:53,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4870080.0, ans=0.5 2024-08-20 16:24:01,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4870080.0, ans=0.2 2024-08-20 16:24:10,948 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-20 16:24:24,101 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 16:24:26,341 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12850, loss[loss=0.09282, beats_loss=0.01023, ecapa_loss=0.0001658, whisper_loss=0.08092, over 13291.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001383, whisper_loss=0.08917, over 3811089.45 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:24:31,409 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 16:25:08,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4870380.0, ans=0.125 2024-08-20 16:25:09,772 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 16:25:11,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.389e+01 2.612e+01 2.924e+01 4.831e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-20 16:25:11,574 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 16:25:16,332 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 16:25:51,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4870680.0, ans=0.125 2024-08-20 16:26:08,771 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 16 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 16:26:11,068 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 12 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 16:26:12,664 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12900, loss[loss=0.0715, beats_loss=0.01521, ecapa_loss=0.0001285, whisper_loss=0.055, over 13915.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01061, ecapa_loss=0.0001385, whisper_loss=0.08822, over 3822280.90 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:26:17,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4870780.0, ans=0.125 2024-08-20 16:26:28,411 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-20 16:27:01,660 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 16:27:19,445 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 16:27:22,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4871080.0, ans=0.0 2024-08-20 16:27:56,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4871180.0, ans=0.125 2024-08-20 16:27:58,871 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 12950, loss[loss=0.1207, beats_loss=0.00737, ecapa_loss=0.0001545, whisper_loss=0.1118, over 20849.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001384, whisper_loss=0.089, over 3823059.57 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:28:02,336 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 16:28:05,405 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4871280.0, ans=0.1 2024-08-20 16:28:21,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4871380.0, ans=0.125 2024-08-20 16:28:40,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.303e+01 2.529e+01 2.820e+01 1.360e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 16:29:03,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4871580.0, ans=0.2 2024-08-20 16:29:28,255 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 16:29:37,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4871680.0, ans=0.125 2024-08-20 16:29:40,537 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08873618394136429, model_norm_threshold=50.58396911621094 2024-08-20 16:29:40,701 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.213e+04, grad_sumsq=8.006e+03, orig_rms_sq=9.010e+00 2024-08-20 16:29:47,544 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13000, loss[loss=0.09398, beats_loss=0.01334, ecapa_loss=0.0001049, whisper_loss=0.0796, over 22513.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.000139, whisper_loss=0.0896, over 3858096.41 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:30:02,802 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 21 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-20 16:30:11,519 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-08-20 16:30:33,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=12.0 2024-08-20 16:30:33,833 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08446179330348969, model_norm_threshold=50.58396911621094 2024-08-20 16:30:33,998 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.496e+04, grad_sumsq=4.191e+06, orig_rms_sq=1.073e-02 2024-08-20 16:30:40,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4871980.0, ans=0.2 2024-08-20 16:30:40,984 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 16:30:53,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4871980.0, ans=0.1 2024-08-20 16:31:12,776 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-20 16:31:14,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4872080.0, ans=0.1 2024-08-20 16:31:29,249 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 16:31:30,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-20 16:31:39,783 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13050, loss[loss=0.1059, beats_loss=0.01065, ecapa_loss=0.0001134, whisper_loss=0.09412, over 19952.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0107, ecapa_loss=0.0001394, whisper_loss=0.08882, over 3839505.47 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:31:57,033 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4872280.0, ans=0.1 2024-08-20 16:31:57,272 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-20 16:32:02,160 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 19 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-20 16:32:18,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4872380.0, ans=0.125 2024-08-20 16:32:21,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.399e+01 2.559e+01 2.848e+01 5.989e+02, threshold=5.117e+01, percent-clipped=3.0 2024-08-20 16:32:54,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4872580.0, ans=0.125 2024-08-20 16:33:02,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4872580.0, ans=0.0 2024-08-20 16:33:06,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-20 16:33:11,437 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 15 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 16:33:27,454 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13100, loss[loss=0.07278, beats_loss=0.01058, ecapa_loss=0.0001619, whisper_loss=0.06058, over 15062.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001397, whisper_loss=0.08899, over 3794364.14 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:33:29,700 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 16:33:40,404 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4872780.0, ans=0.125 2024-08-20 16:33:46,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4872780.0, ans=0.1 2024-08-20 16:33:47,521 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 14 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 16:34:10,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4872880.0, ans=0.07 2024-08-20 16:34:42,115 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 16:34:43,398 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-20 16:35:23,942 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13150, loss[loss=0.08511, beats_loss=0.01108, ecapa_loss=0.0001579, whisper_loss=0.07245, over 17490.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001392, whisper_loss=0.08891, over 3836622.89 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:35:36,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4873280.0, ans=0.125 2024-08-20 16:36:10,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.480e+01 2.703e+01 4.896e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 16:36:37,253 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 16:36:42,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4873580.0, ans=0.125 2024-08-20 16:36:46,030 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 21 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-20 16:36:50,413 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 16:37:13,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4873680.0, ans=0.125 2024-08-20 16:37:15,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4873780.0, ans=0.2 2024-08-20 16:37:15,787 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13200, loss[loss=0.1109, beats_loss=0.009479, ecapa_loss=0.0001244, whisper_loss=0.1002, over 23682.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01057, ecapa_loss=0.0001387, whisper_loss=0.08866, over 3790052.60 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:37:17,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4873780.0, ans=0.0 2024-08-20 16:37:20,209 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 16:37:37,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-08-20 16:37:45,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4873880.0, ans=0.2 2024-08-20 16:37:45,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4873880.0, ans=0.09899494936611666 2024-08-20 16:38:25,545 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=12.0 2024-08-20 16:38:28,617 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 16:38:33,545 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 16:38:55,789 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 16:39:05,622 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13250, loss[loss=0.1066, beats_loss=0.009903, ecapa_loss=0.000143, whisper_loss=0.09531, over 20650.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001395, whisper_loss=0.08888, over 3807891.17 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:39:12,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-20 16:39:22,752 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4874280.0, ans=0.125 2024-08-20 16:39:47,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.277e+01 2.601e+01 3.009e+01 4.180e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-20 16:39:48,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4874480.0, ans=0.0 2024-08-20 16:39:48,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4874480.0, ans=0.2 2024-08-20 16:39:55,234 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 16:39:57,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4874480.0, ans=0.2 2024-08-20 16:39:57,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4874480.0, ans=0.0 2024-08-20 16:40:03,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-08-20 16:40:08,848 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 28 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 16:40:10,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4874580.0, ans=0.2 2024-08-20 16:40:27,655 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 16:40:37,875 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-20 16:40:45,952 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4874680.0, ans=0.0 2024-08-20 16:40:46,882 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 24 from Vox, 12 fro AS 2024-08-20 16:40:48,952 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-20 16:40:51,044 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13300, loss[loss=0.09039, beats_loss=0.008969, ecapa_loss=0.000152, whisper_loss=0.0799, over 17265.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001385, whisper_loss=0.08892, over 3812419.26 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:41:46,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4874980.0, ans=0.1 2024-08-20 16:42:03,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4875080.0, ans=0.0 2024-08-20 16:42:12,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4875080.0, ans=0.125 2024-08-20 16:42:40,145 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13350, loss[loss=0.1032, beats_loss=0.01103, ecapa_loss=0.0001302, whisper_loss=0.09086, over 15103.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001388, whisper_loss=0.08959, over 3805821.32 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:43:04,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=4875380.0, ans=0.02 2024-08-20 16:43:23,133 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.248e+01 2.436e+01 2.816e+01 2.858e+02, threshold=4.871e+01, percent-clipped=3.0 2024-08-20 16:43:40,689 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 39 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 16:43:51,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-08-20 16:44:12,051 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 16:44:34,397 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13400, loss[loss=0.1112, beats_loss=0.009214, ecapa_loss=0.0001213, whisper_loss=0.1008, over 17402.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001381, whisper_loss=0.08912, over 3796934.69 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:44:57,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-20 16:45:00,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4875880.0, ans=0.0 2024-08-20 16:45:05,096 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 16:45:10,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4875880.0, ans=0.2 2024-08-20 16:45:29,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4875980.0, ans=0.125 2024-08-20 16:45:30,043 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=12.0 2024-08-20 16:46:26,294 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 18 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 16:46:33,333 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13450, loss[loss=0.09607, beats_loss=0.01005, ecapa_loss=0.0001568, whisper_loss=0.08445, over 17268.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.08953, over 3782644.59 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:46:39,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4876280.0, ans=0.1 2024-08-20 16:47:01,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4876380.0, ans=15.0 2024-08-20 16:47:03,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4876380.0, ans=0.125 2024-08-20 16:47:10,922 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 16:47:17,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4876380.0, ans=0.125 2024-08-20 16:47:20,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.518e+01 2.794e+01 2.882e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 16:48:00,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4876580.0, ans=0.05 2024-08-20 16:48:03,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4876680.0, ans=0.0 2024-08-20 16:48:25,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-20 16:48:25,851 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13500, loss[loss=0.1116, beats_loss=0.009447, ecapa_loss=0.000135, whisper_loss=0.1008, over 22528.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.08984, over 3799118.34 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:48:26,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4876780.0, ans=0.0 2024-08-20 16:49:19,787 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 16:49:20,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4876980.0, ans=0.2 2024-08-20 16:49:37,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-08-20 16:49:51,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4877080.0, ans=0.1 2024-08-20 16:49:54,566 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 16:50:18,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4877180.0, ans=0.0 2024-08-20 16:50:21,771 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13550, loss[loss=0.101, beats_loss=0.008918, ecapa_loss=0.0001287, whisper_loss=0.09083, over 17534.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.08977, over 3804631.21 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:51:02,200 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:51:02,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4877380.0, ans=0.0 2024-08-20 16:51:10,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.534e+01 2.808e+01 5.425e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-20 16:51:25,544 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 18 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 16:51:26,806 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4877480.0, ans=0.0 2024-08-20 16:51:38,037 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-20 16:51:46,842 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 16:52:12,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4877680.0, ans=0.0 2024-08-20 16:52:17,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4877680.0, ans=22.5 2024-08-20 16:52:23,273 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13600, loss[loss=0.1108, beats_loss=0.009926, ecapa_loss=0.0001285, whisper_loss=0.09955, over 23237.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001388, whisper_loss=0.09027, over 3792885.60 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:52:42,593 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 16:52:53,113 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2024-08-20 16:53:11,722 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 16:53:14,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4877980.0, ans=0.0 2024-08-20 16:53:49,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4878080.0, ans=0.125 2024-08-20 16:54:24,384 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13650, loss[loss=0.138, beats_loss=0.008489, ecapa_loss=0.0001748, whisper_loss=0.1278, over 21667.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.09039, over 3769983.63 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:54:28,880 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 16:54:37,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4878280.0, ans=0.0 2024-08-20 16:54:52,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4878380.0, ans=0.125 2024-08-20 16:55:08,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4878380.0, ans=0.1 2024-08-20 16:55:11,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.380e+01 2.594e+01 2.939e+01 1.944e+02, threshold=5.188e+01, percent-clipped=3.0 2024-08-20 16:56:22,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4878780.0, ans=0.0 2024-08-20 16:56:23,404 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13700, loss[loss=0.1033, beats_loss=0.008609, ecapa_loss=0.0002026, whisper_loss=0.09264, over 20508.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.0899, over 3752725.15 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:57:44,941 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 16:58:08,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4879180.0, ans=0.0 2024-08-20 16:58:17,609 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13750, loss[loss=0.1009, beats_loss=0.01023, ecapa_loss=0.0001202, whisper_loss=0.08947, over 12938.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001384, whisper_loss=0.09027, over 3800490.10 frames. ], batch size: 49, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:58:18,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4879280.0, ans=0.125 2024-08-20 16:58:27,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4879280.0, ans=0.125 2024-08-20 16:58:30,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4879280.0, ans=0.125 2024-08-20 16:58:42,523 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 35 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 16:59:03,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.268e+01 2.560e+01 2.819e+01 5.576e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-20 16:59:13,039 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 16:59:32,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4879580.0, ans=10.0 2024-08-20 16:59:34,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4879580.0, ans=0.05 2024-08-20 17:00:00,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4879680.0, ans=0.125 2024-08-20 17:00:12,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4879680.0, ans=0.125 2024-08-20 17:00:15,481 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13800, loss[loss=0.1004, beats_loss=0.012, ecapa_loss=0.0001262, whisper_loss=0.08717, over 16322.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001386, whisper_loss=0.09056, over 3810300.50 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:00:25,477 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 17:00:27,340 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 17:00:30,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4879780.0, ans=0.09899494936611666 2024-08-20 17:00:46,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4879880.0, ans=0.0 2024-08-20 17:00:51,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4879880.0, ans=0.0 2024-08-20 17:01:01,509 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 17:01:34,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4880080.0, ans=0.0 2024-08-20 17:01:49,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-20 17:01:59,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4880180.0, ans=0.07 2024-08-20 17:02:11,003 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-08-20 17:02:14,265 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13850, loss[loss=0.1173, beats_loss=0.01002, ecapa_loss=0.0001163, whisper_loss=0.1062, over 23677.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001391, whisper_loss=0.08988, over 3833850.32 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:02:28,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-08-20 17:02:37,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2024-08-20 17:02:43,357 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 17:02:51,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4880380.0, ans=0.0 2024-08-20 17:02:54,860 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-20 17:03:01,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.225e+01 2.417e+01 2.709e+01 3.540e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-20 17:03:10,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4880480.0, ans=0.1 2024-08-20 17:03:15,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4880480.0, ans=0.125 2024-08-20 17:03:16,326 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 9 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 17:03:21,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-08-20 17:03:25,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-20 17:03:34,906 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-20 17:03:40,547 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 19 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 17:03:44,979 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 27 from LS+wenet, 16 from Vox, 15 fro AS 2024-08-20 17:04:01,491 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 15 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 17:04:06,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4880680.0, ans=0.125 2024-08-20 17:04:10,311 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13900, loss[loss=0.09317, beats_loss=0.009, ecapa_loss=0.000114, whisper_loss=0.08303, over 17691.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.000139, whisper_loss=0.08935, over 3838886.85 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:04:10,554 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 21 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-20 17:04:29,866 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:04:35,784 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 17:04:37,752 WARNING [optim.py:496] (2/4) Scaling gradients by 0.016832223162055016, model_norm_threshold=48.33732604980469 2024-08-20 17:04:37,936 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.526e+05, grad_sumsq=7.526e+05, orig_rms_sq=1.000e+00 2024-08-20 17:05:07,746 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 17:05:16,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4881080.0, ans=0.0 2024-08-20 17:05:50,912 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 12 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 17:05:57,360 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 13950, loss[loss=0.09206, beats_loss=0.01122, ecapa_loss=0.0001315, whisper_loss=0.07953, over 20628.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001404, whisper_loss=0.09021, over 3832454.03 frames. ], batch size: 85, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:06:01,955 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 17:06:02,934 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4881280.0, ans=0.1 2024-08-20 17:06:05,812 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 17:06:40,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.333e+01 2.602e+01 2.929e+01 2.872e+03, threshold=5.204e+01, percent-clipped=2.0 2024-08-20 17:07:02,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4881580.0, ans=0.0 2024-08-20 17:07:02,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-20 17:07:34,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4881680.0, ans=0.0 2024-08-20 17:07:37,396 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 15 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 17:07:43,296 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14000, loss[loss=0.124, beats_loss=0.008473, ecapa_loss=0.0001623, whisper_loss=0.1139, over 21809.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.09106, over 3828372.42 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:07:59,212 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 17:08:01,441 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 17:08:12,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4881880.0, ans=0.2 2024-08-20 17:08:31,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4881980.0, ans=0.0 2024-08-20 17:08:49,091 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 17:08:58,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4882080.0, ans=0.0 2024-08-20 17:09:31,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4882180.0, ans=0.0 2024-08-20 17:09:37,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4882280.0, ans=0.2 2024-08-20 17:09:38,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14050, loss[loss=0.1016, beats_loss=0.01088, ecapa_loss=0.0001109, whisper_loss=0.08965, over 17845.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.09095, over 3843319.02 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:09:58,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4882280.0, ans=0.125 2024-08-20 17:10:06,459 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 17:10:19,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4882380.0, ans=10.0 2024-08-20 17:10:25,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.229e+01 2.456e+01 2.749e+01 5.293e+01, threshold=4.913e+01, percent-clipped=1.0 2024-08-20 17:10:31,420 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4882480.0, ans=0.125 2024-08-20 17:10:32,529 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 17:10:46,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4882480.0, ans=0.0 2024-08-20 17:10:58,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-20 17:11:24,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4882680.0, ans=0.125 2024-08-20 17:11:36,464 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14100, loss[loss=0.1122, beats_loss=0.01007, ecapa_loss=0.0001343, whisper_loss=0.1008, over 23108.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.09121, over 3853746.48 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:11:53,921 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-08-20 17:12:28,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4882980.0, ans=0.2 2024-08-20 17:13:03,354 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 17:13:29,058 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 17:13:33,713 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14150, loss[loss=0.09752, beats_loss=0.009713, ecapa_loss=0.0001249, whisper_loss=0.08656, over 17308.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001403, whisper_loss=0.09113, over 3818859.77 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:13:59,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 17:14:13,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4883380.0, ans=0.125 2024-08-20 17:14:18,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.324e+01 2.479e+01 2.720e+01 4.062e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-20 17:14:42,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4883580.0, ans=0.125 2024-08-20 17:14:48,053 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 17:15:09,261 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 17:15:14,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4883680.0, ans=0.125 2024-08-20 17:15:22,822 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-20 17:15:25,199 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14200, loss[loss=0.1148, beats_loss=0.009602, ecapa_loss=0.0001356, whisper_loss=0.1039, over 21210.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09126, over 3855104.82 frames. ], batch size: 84, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:15:27,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4883780.0, ans=0.125 2024-08-20 17:15:45,710 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 17:16:05,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4883980.0, ans=0.1 2024-08-20 17:16:05,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4883980.0, ans=0.1 2024-08-20 17:16:14,734 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 17:16:27,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4884080.0, ans=0.125 2024-08-20 17:16:57,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4884180.0, ans=0.0 2024-08-20 17:17:02,423 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 20 from LS+wenet, 29 from Vox, 45 fro AS 2024-08-20 17:17:03,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2024-08-20 17:17:10,848 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14250, loss[loss=0.1035, beats_loss=0.01194, ecapa_loss=0.0001064, whisper_loss=0.09052, over 23525.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.09064, over 3869391.67 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:17:20,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4884280.0, ans=0.0 2024-08-20 17:17:26,006 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 17:17:37,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4884380.0, ans=0.0 2024-08-20 17:17:37,905 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:17:41,991 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.914e+00 2024-08-20 17:17:53,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.287e+01 2.497e+01 2.839e+01 4.280e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 17:18:04,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4884480.0, ans=0.015 2024-08-20 17:18:12,155 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 17:18:38,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4884680.0, ans=0.2 2024-08-20 17:18:49,282 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2024-08-20 17:18:53,725 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14300, loss[loss=0.1114, beats_loss=0.01083, ecapa_loss=0.0001192, whisper_loss=0.09936, over 23904.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.09016, over 3866827.76 frames. ], batch size: 95, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:19:08,212 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 17:19:29,323 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 17:19:45,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4884980.0, ans=0.1 2024-08-20 17:19:53,253 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 17:19:55,196 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 17:19:58,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4885080.0, ans=0.05 2024-08-20 17:20:12,803 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 17:20:30,146 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 17:20:38,293 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14350, loss[loss=0.08856, beats_loss=0.01206, ecapa_loss=0.0001203, whisper_loss=0.07529, over 17434.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001395, whisper_loss=0.0903, over 3810180.34 frames. ], batch size: 68, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:20:49,810 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.290e+00 2024-08-20 17:21:11,036 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 17:21:12,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4885380.0, ans=0.125 2024-08-20 17:21:14,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-20 17:21:19,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.424e+01 2.742e+01 3.115e+01 1.804e+02, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 17:21:29,753 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 32 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 17:21:49,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4885580.0, ans=0.1 2024-08-20 17:21:49,382 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4885580.0, ans=0.5 2024-08-20 17:22:04,902 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.633e+00 2024-08-20 17:22:11,204 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 17:22:11,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4885680.0, ans=10.0 2024-08-20 17:22:12,078 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4885680.0, ans=0.125 2024-08-20 17:22:18,668 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14400, loss[loss=0.12, beats_loss=0.009792, ecapa_loss=0.0001296, whisper_loss=0.1089, over 23516.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001389, whisper_loss=0.0903, over 3815079.80 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:22:19,966 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=22.5 2024-08-20 17:22:26,263 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 17:22:27,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4885780.0, ans=0.09899494936611666 2024-08-20 17:23:22,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4886080.0, ans=0.125 2024-08-20 17:23:52,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4886180.0, ans=0.0 2024-08-20 17:23:58,244 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14450, loss[loss=0.09253, beats_loss=0.01018, ecapa_loss=0.000137, whisper_loss=0.08099, over 20060.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.000139, whisper_loss=0.09025, over 3811207.02 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:24:16,511 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 17:24:33,719 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-20 17:24:39,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4886480.0, ans=0.0 2024-08-20 17:24:41,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.260e+01 2.488e+01 2.810e+01 3.938e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-20 17:24:45,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4886480.0, ans=0.2 2024-08-20 17:25:08,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4886580.0, ans=0.125 2024-08-20 17:25:16,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4886580.0, ans=0.125 2024-08-20 17:25:17,486 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 17:25:31,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4886680.0, ans=0.1 2024-08-20 17:25:31,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4886680.0, ans=0.1 2024-08-20 17:25:40,962 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14500, loss[loss=0.1042, beats_loss=0.008753, ecapa_loss=0.0001521, whisper_loss=0.09391, over 17318.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001381, whisper_loss=0.09046, over 3829586.14 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:25:43,171 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 17:25:45,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-20 17:25:49,070 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 31 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 17:26:01,350 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 17:27:25,742 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14550, loss[loss=0.09299, beats_loss=0.01124, ecapa_loss=0.000162, whisper_loss=0.08012, over 20768.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0103, ecapa_loss=0.0001387, whisper_loss=0.09043, over 3817753.76 frames. ], batch size: 85, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:27:40,416 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 17:27:53,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4887380.0, ans=0.125 2024-08-20 17:28:11,720 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.302e+01 2.517e+01 2.766e+01 3.665e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-20 17:28:56,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4887680.0, ans=0.2 2024-08-20 17:29:05,233 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-08-20 17:29:15,854 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14600, loss[loss=0.09566, beats_loss=0.01197, ecapa_loss=0.0001139, whisper_loss=0.08256, over 22472.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001388, whisper_loss=0.08983, over 3837723.28 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:29:30,297 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 17:29:38,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4887880.0, ans=0.1 2024-08-20 17:29:41,720 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 33 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-20 17:30:14,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4887980.0, ans=0.0 2024-08-20 17:30:50,792 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 17:31:02,582 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14650, loss[loss=0.1036, beats_loss=0.01115, ecapa_loss=0.0001093, whisper_loss=0.09132, over 23360.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001387, whisper_loss=0.08985, over 3873595.09 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:31:11,499 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 17:31:48,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.228e+01 2.454e+01 2.785e+01 6.684e+01, threshold=4.907e+01, percent-clipped=2.0 2024-08-20 17:32:36,895 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-20 17:32:40,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=12.0 2024-08-20 17:32:44,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4888680.0, ans=0.125 2024-08-20 17:32:51,507 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14700, loss[loss=0.07979, beats_loss=0.01151, ecapa_loss=0.0001811, whisper_loss=0.06647, over 21403.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001398, whisper_loss=0.08962, over 3863019.77 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:32:54,931 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4888780.0, ans=0.1 2024-08-20 17:33:01,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4888780.0, ans=0.125 2024-08-20 17:33:16,788 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-20 17:33:31,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4888980.0, ans=0.125 2024-08-20 17:33:38,408 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-08-20 17:34:33,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4889280.0, ans=0.0 2024-08-20 17:34:34,571 INFO [train_multi_KD3.py:1117] (2/4) Epoch 33, batch 14750, loss[loss=0.1012, beats_loss=0.01151, ecapa_loss=0.0001352, whisper_loss=0.08836, over 15620.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001396, whisper_loss=0.08986, over 3880768.17 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:34:54,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:34:56,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:34:57,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2024-08-20 17:35:00,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4889380.0, ans=0.1 2024-08-20 17:35:05,459 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 17:35:06,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4889380.0, ans=0.0 2024-08-20 17:35:11,093 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:35:17,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.334e+01 2.652e+01 2.948e+01 4.454e+01, threshold=5.304e+01, percent-clipped=0.0 2024-08-20 17:35:19,675 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-20 17:35:22,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4889480.0, ans=0.0 2024-08-20 17:35:34,255 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 17:35:43,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4889580.0, ans=0.125 2024-08-20 17:36:34,080 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 0, loss[loss=0.1291, beats_loss=0.008031, ecapa_loss=0.000136, whisper_loss=0.1197, over 23092.00 frames. ], tot_loss[loss=0.1291, beats_loss=0.008031, ecapa_loss=0.000136, whisper_loss=0.1197, over 23092.00 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:36:34,081 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 17:37:09,396 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2495, over 931116.00 frames. 2024-08-20 17:37:31,882 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 17:39:14,594 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 17:39:14,602 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 17:39:38,897 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 17:40:43,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4889990.0, ans=0.0 2024-08-20 17:41:19,467 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 50, loss[loss=0.08276, beats_loss=0.008584, ecapa_loss=0.0001847, whisper_loss=0.07233, over 14001.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.008994, ecapa_loss=0.0001444, whisper_loss=0.09077, over 866332.98 frames. ], batch size: 56, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:41:20,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4890190.0, ans=0.2 2024-08-20 17:41:27,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4890190.0, ans=0.125 2024-08-20 17:41:36,963 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 27 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 17:41:47,309 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=15.0 2024-08-20 17:41:49,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4890290.0, ans=0.125 2024-08-20 17:41:52,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4890290.0, ans=0.125 2024-08-20 17:41:59,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4890290.0, ans=0.125 2024-08-20 17:42:03,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4890290.0, ans=0.2 2024-08-20 17:42:12,479 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.935e+00 2024-08-20 17:42:16,391 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 17:42:33,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.429e+01 2.698e+01 2.927e+01 5.810e+01, threshold=5.396e+01, percent-clipped=1.0 2024-08-20 17:42:45,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4890490.0, ans=0.0 2024-08-20 17:42:48,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4890490.0, ans=0.125 2024-08-20 17:42:52,049 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 17:42:58,641 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4890590.0, ans=0.125 2024-08-20 17:43:13,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4890590.0, ans=0.125 2024-08-20 17:43:14,021 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 17:43:23,667 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 100, loss[loss=0.1048, beats_loss=0.007899, ecapa_loss=0.000128, whisper_loss=0.09559, over 19203.00 frames. ], tot_loss[loss=0.09942, beats_loss=0.00905, ecapa_loss=0.000143, whisper_loss=0.08894, over 1521369.84 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:43:52,836 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 17:44:25,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4890890.0, ans=0.0 2024-08-20 17:44:38,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4890890.0, ans=0.125 2024-08-20 17:45:02,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4890990.0, ans=0.125 2024-08-20 17:45:05,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4890990.0, ans=0.0 2024-08-20 17:45:23,761 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:45:26,075 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.025e+01 2024-08-20 17:45:30,532 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-20 17:45:30,984 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 150, loss[loss=0.1043, beats_loss=0.009015, ecapa_loss=0.0001567, whisper_loss=0.09372, over 18967.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.009178, ecapa_loss=0.0001417, whisper_loss=0.08899, over 2010365.41 frames. ], batch size: 78, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:45:56,591 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 17:45:58,644 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 17:46:12,914 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4891390.0, ans=0.125 2024-08-20 17:46:28,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4891390.0, ans=0.0 2024-08-20 17:46:35,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.414e+01 2.624e+01 2.993e+01 4.090e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-20 17:46:41,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-20 17:46:46,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2024-08-20 17:46:49,101 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 23 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-20 17:47:15,756 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 200, loss[loss=0.1067, beats_loss=0.01052, ecapa_loss=0.0001259, whisper_loss=0.09488, over 13874.00 frames. ], tot_loss[loss=0.09932, beats_loss=0.009558, ecapa_loss=0.0001417, whisper_loss=0.08834, over 2392239.54 frames. ], batch size: 53, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:47:19,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4891690.0, ans=0.125 2024-08-20 17:47:38,375 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 11 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 17:47:43,362 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 17:47:50,905 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 17:47:59,873 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 17:48:22,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4891990.0, ans=0.125 2024-08-20 17:48:50,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4892190.0, ans=0.125 2024-08-20 17:48:50,913 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 250, loss[loss=0.09207, beats_loss=0.008506, ecapa_loss=0.0001228, whisper_loss=0.08233, over 19966.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009798, ecapa_loss=0.0001402, whisper_loss=0.08909, over 2709801.31 frames. ], batch size: 75, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:48:56,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4892190.0, ans=0.2 2024-08-20 17:48:58,976 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 17:49:23,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2024-08-20 17:49:24,496 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-20 17:49:32,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:45,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:48,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.222e+01 2.433e+01 2.766e+01 4.202e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-20 17:50:23,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 300, loss[loss=0.09181, beats_loss=0.01107, ecapa_loss=0.0001713, whisper_loss=0.07903, over 17238.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009959, ecapa_loss=0.00014, whisper_loss=0.08923, over 2945748.60 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:50:37,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4892690.0, ans=0.1 2024-08-20 17:51:10,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4892890.0, ans=0.125 2024-08-20 17:51:12,712 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 17:51:32,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4892990.0, ans=0.0 2024-08-20 17:51:55,174 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 350, loss[loss=0.09865, beats_loss=0.01187, ecapa_loss=0.0001231, whisper_loss=0.08555, over 20659.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01013, ecapa_loss=0.0001396, whisper_loss=0.08886, over 3118831.74 frames. ], batch size: 80, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:52:13,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4893290.0, ans=0.0 2024-08-20 17:52:20,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4893290.0, ans=0.125 2024-08-20 17:52:25,053 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 17 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 17:52:29,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4893290.0, ans=0.2 2024-08-20 17:52:35,301 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 39 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 17:52:35,539 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4893390.0, ans=0.0 2024-08-20 17:52:37,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4893390.0, ans=0.0 2024-08-20 17:52:39,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4893390.0, ans=0.125 2024-08-20 17:52:48,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.330e+01 2.558e+01 2.874e+01 1.855e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-20 17:52:49,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4893490.0, ans=0.2 2024-08-20 17:52:53,141 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 17:53:08,958 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4893590.0, ans=0.0 2024-08-20 17:53:18,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-20 17:53:25,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4893690.0, ans=0.0 2024-08-20 17:53:26,339 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 400, loss[loss=0.1061, beats_loss=0.008974, ecapa_loss=0.0001427, whisper_loss=0.09566, over 22376.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0102, ecapa_loss=0.0001396, whisper_loss=0.08847, over 3250790.66 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:53:33,178 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 17:53:37,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4893690.0, ans=0.0 2024-08-20 17:53:37,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4893690.0, ans=0.125 2024-08-20 17:53:40,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4893690.0, ans=0.0 2024-08-20 17:53:41,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-20 17:53:45,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-20 17:53:45,751 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 17:53:49,500 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-20 17:54:04,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4893890.0, ans=0.2 2024-08-20 17:54:20,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4893990.0, ans=0.0 2024-08-20 17:54:34,455 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 17:54:35,676 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2024-08-20 17:54:45,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-20 17:54:54,323 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 17:54:55,810 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 450, loss[loss=0.09887, beats_loss=0.01002, ecapa_loss=0.0001316, whisper_loss=0.08753, over 18766.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01024, ecapa_loss=0.0001392, whisper_loss=0.08888, over 3377713.89 frames. ], batch size: 74, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:55:01,806 WARNING [optim.py:496] (2/4) Scaling gradients by 0.014215901494026184, model_norm_threshold=51.16255569458008 2024-08-20 17:55:01,975 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.649e+06, grad_sumsq=5.014e+05, orig_rms_sq=3.288e+00 2024-08-20 17:55:20,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4894290.0, ans=10.0 2024-08-20 17:55:22,554 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4894290.0, ans=0.0 2024-08-20 17:55:35,980 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-20 17:55:49,495 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 17:55:51,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.355e+01 2.533e+01 2.775e+01 3.599e+03, threshold=5.067e+01, percent-clipped=2.0 2024-08-20 17:56:27,829 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 500, loss[loss=0.1017, beats_loss=0.01002, ecapa_loss=0.0001236, whisper_loss=0.09049, over 20154.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01023, ecapa_loss=0.0001386, whisper_loss=0.08905, over 3471505.94 frames. ], batch size: 79, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:57:31,583 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 17:57:44,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4895090.0, ans=0.2 2024-08-20 17:57:58,251 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 550, loss[loss=0.1034, beats_loss=0.009748, ecapa_loss=0.0001386, whisper_loss=0.09222, over 18295.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01018, ecapa_loss=0.0001393, whisper_loss=0.08915, over 3539798.89 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:58:04,480 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 17:58:18,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-20 17:58:28,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4895290.0, ans=0.2 2024-08-20 17:58:33,651 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-20 17:58:46,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4895390.0, ans=0.2 2024-08-20 17:58:52,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.257e+01 2.490e+01 2.718e+01 3.602e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 17:59:18,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4895590.0, ans=0.125 2024-08-20 17:59:27,724 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 600, loss[loss=0.09518, beats_loss=0.01017, ecapa_loss=0.000146, whisper_loss=0.08355, over 15567.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01015, ecapa_loss=0.0001391, whisper_loss=0.08916, over 3590481.54 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:59:39,743 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 17:59:47,818 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2024-08-20 17:59:50,635 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 18:00:19,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4895990.0, ans=0.0 2024-08-20 18:00:40,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4896090.0, ans=0.125 2024-08-20 18:00:53,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4896090.0, ans=0.0 2024-08-20 18:00:56,119 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:00:56,858 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 650, loss[loss=0.1029, beats_loss=0.009408, ecapa_loss=0.0001807, whisper_loss=0.0917, over 21303.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01011, ecapa_loss=0.0001391, whisper_loss=0.08947, over 3595859.20 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:01:20,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4896290.0, ans=0.125 2024-08-20 18:01:34,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4896390.0, ans=0.125 2024-08-20 18:01:51,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.174e+01 2.397e+01 2.738e+01 4.303e+01, threshold=4.793e+01, percent-clipped=0.0 2024-08-20 18:01:55,335 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 18:02:06,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4896490.0, ans=0.125 2024-08-20 18:02:26,544 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 700, loss[loss=0.08004, beats_loss=0.008886, ecapa_loss=0.0001434, whisper_loss=0.06972, over 14956.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01012, ecapa_loss=0.000139, whisper_loss=0.0901, over 3644664.35 frames. ], batch size: 58, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:02:41,253 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 18:03:09,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4896890.0, ans=0.0 2024-08-20 18:03:19,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.08 vs. limit=22.5 2024-08-20 18:03:21,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4896990.0, ans=0.125 2024-08-20 18:03:39,191 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 18:03:57,682 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 750, loss[loss=0.09573, beats_loss=0.01279, ecapa_loss=0.0001096, whisper_loss=0.08185, over 22668.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01021, ecapa_loss=0.0001381, whisper_loss=0.09026, over 3677974.22 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:04:01,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4897190.0, ans=0.125 2024-08-20 18:04:15,440 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 18:04:15,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4897290.0, ans=0.2 2024-08-20 18:04:23,130 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 18:04:49,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4897490.0, ans=0.0 2024-08-20 18:04:50,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.248e+01 2.436e+01 2.663e+01 4.558e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-20 18:05:01,346 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 18:05:04,381 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-08-20 18:05:05,124 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 18:05:25,689 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 800, loss[loss=0.1031, beats_loss=0.009118, ecapa_loss=0.0001416, whisper_loss=0.09257, over 21873.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01019, ecapa_loss=0.0001383, whisper_loss=0.08986, over 3701599.08 frames. ], batch size: 85, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:05:32,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4897690.0, ans=0.125 2024-08-20 18:05:38,261 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 18:05:45,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4897790.0, ans=0.09899494936611666 2024-08-20 18:05:54,826 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4897790.0, ans=0.125 2024-08-20 18:06:06,532 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 18:06:16,526 WARNING [optim.py:496] (2/4) Scaling gradients by 0.034612394869327545, model_norm_threshold=48.72909927368164 2024-08-20 18:06:16,695 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.559e+05, grad_sumsq=3.559e+05, orig_rms_sq=1.000e+00 2024-08-20 18:06:23,886 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 18:06:43,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4898090.0, ans=0.0 2024-08-20 18:06:49,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-20 18:06:52,582 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 850, loss[loss=0.1041, beats_loss=0.01056, ecapa_loss=0.0001517, whisper_loss=0.092, over 23178.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0103, ecapa_loss=0.0001363, whisper_loss=0.08892, over 3725364.87 frames. ], batch size: 97, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:07:02,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4898190.0, ans=0.125 2024-08-20 18:07:09,368 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-20 18:07:14,045 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 18:07:24,308 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 18:07:35,703 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 18:07:45,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.491e+01 2.263e+01 2.466e+01 2.785e+01 1.408e+03, threshold=4.933e+01, percent-clipped=1.0 2024-08-20 18:07:47,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-08-20 18:07:52,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4898490.0, ans=0.125 2024-08-20 18:08:02,431 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 18:08:05,533 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.96 vs. limit=22.5 2024-08-20 18:08:22,073 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 900, loss[loss=0.09681, beats_loss=0.0103, ecapa_loss=0.0001243, whisper_loss=0.08527, over 21069.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01026, ecapa_loss=0.0001363, whisper_loss=0.08826, over 3716572.92 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:08:28,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4898690.0, ans=0.125 2024-08-20 18:08:32,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4898690.0, ans=0.0 2024-08-20 18:08:34,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-20 18:08:34,088 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2024-08-20 18:08:36,098 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4898690.0, ans=0.0 2024-08-20 18:08:57,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4898890.0, ans=0.125 2024-08-20 18:09:03,828 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-20 18:09:27,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4898990.0, ans=0.0 2024-08-20 18:09:52,098 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 950, loss[loss=0.09357, beats_loss=0.01228, ecapa_loss=0.0001361, whisper_loss=0.07992, over 22318.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01027, ecapa_loss=0.0001366, whisper_loss=0.08816, over 3745642.50 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:09:54,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4899190.0, ans=0.125 2024-08-20 18:10:19,948 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 11 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 18:10:27,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4899390.0, ans=0.2 2024-08-20 18:10:31,705 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 33 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 18:10:40,106 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03566131740808487, model_norm_threshold=49.32598114013672 2024-08-20 18:10:40,275 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.124e+05, grad_sumsq=3.124e+05, orig_rms_sq=1.000e+00 2024-08-20 18:10:43,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.227e+01 2.460e+01 2.712e+01 1.383e+03, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 18:10:48,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4899490.0, ans=0.1 2024-08-20 18:11:05,724 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 18 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 18:11:20,425 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1000, loss[loss=0.09186, beats_loss=0.01106, ecapa_loss=0.0001009, whisper_loss=0.07979, over 23396.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01026, ecapa_loss=0.000137, whisper_loss=0.08832, over 3740161.46 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:11:36,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4899690.0, ans=0.125 2024-08-20 18:11:45,854 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 14 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 18:11:51,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4899790.0, ans=0.125 2024-08-20 18:11:52,662 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 18:11:53,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4899790.0, ans=0.1 2024-08-20 18:12:25,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4899990.0, ans=0.0 2024-08-20 18:12:29,064 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 18:12:38,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4900090.0, ans=0.125 2024-08-20 18:12:50,726 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1050, loss[loss=0.09273, beats_loss=0.009385, ecapa_loss=0.0001604, whisper_loss=0.08174, over 15747.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01025, ecapa_loss=0.0001371, whisper_loss=0.08884, over 3720179.65 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:13:10,821 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 18:13:12,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4900290.0, ans=0.125 2024-08-20 18:13:31,771 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 18:13:38,532 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 18:13:43,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.230e+01 2.407e+01 2.713e+01 3.528e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-20 18:13:47,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4900490.0, ans=0.125 2024-08-20 18:13:52,251 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 18:13:52,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4900490.0, ans=0.2 2024-08-20 18:13:57,017 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 26 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 18:14:00,392 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 18:14:00,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4900590.0, ans=0.2 2024-08-20 18:14:17,683 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1100, loss[loss=0.1077, beats_loss=0.01255, ecapa_loss=0.0001214, whisper_loss=0.09389, over 23487.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001366, whisper_loss=0.08896, over 3733723.56 frames. ], batch size: 94, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:14:56,827 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 18:15:28,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4901090.0, ans=0.0 2024-08-20 18:15:32,627 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 12 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 18:15:44,766 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1150, loss[loss=0.1155, beats_loss=0.007315, ecapa_loss=0.0001409, whisper_loss=0.1067, over 24227.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01021, ecapa_loss=0.0001365, whisper_loss=0.08951, over 3747789.01 frames. ], batch size: 93, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:16:35,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4901390.0, ans=0.125 2024-08-20 18:16:38,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.288e+01 2.529e+01 2.855e+01 5.753e+01, threshold=5.059e+01, percent-clipped=2.0 2024-08-20 18:16:48,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4901490.0, ans=0.07 2024-08-20 18:17:06,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4901590.0, ans=0.125 2024-08-20 18:17:14,347 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1200, loss[loss=0.09102, beats_loss=0.01155, ecapa_loss=0.0001154, whisper_loss=0.07832, over 20908.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001358, whisper_loss=0.08934, over 3790668.33 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:17:22,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4901690.0, ans=0.0 2024-08-20 18:17:37,580 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 11 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 18:18:07,828 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:18:15,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4901990.0, ans=0.125 2024-08-20 18:18:23,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-20 18:18:43,503 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1250, loss[loss=0.118, beats_loss=0.008044, ecapa_loss=0.0001318, whisper_loss=0.1087, over 21663.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001356, whisper_loss=0.08891, over 3788868.54 frames. ], batch size: 83, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:18:45,685 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 18:18:57,058 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 18:19:35,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.263e+01 2.557e+01 2.836e+01 4.039e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-20 18:19:52,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4902590.0, ans=0.0 2024-08-20 18:19:54,495 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4902590.0, ans=0.1 2024-08-20 18:20:11,669 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1300, loss[loss=0.0973, beats_loss=0.01161, ecapa_loss=0.0001456, whisper_loss=0.08424, over 13755.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001363, whisper_loss=0.08895, over 3807627.47 frames. ], batch size: 54, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:20:35,399 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 18:20:41,297 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-08-20 18:20:50,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2024-08-20 18:21:25,788 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 18:21:31,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4903090.0, ans=0.125 2024-08-20 18:21:40,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4903190.0, ans=0.125 2024-08-20 18:21:41,565 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1350, loss[loss=0.1132, beats_loss=0.008497, ecapa_loss=8.846e-05, whisper_loss=0.1038, over 16494.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001357, whisper_loss=0.08915, over 3800937.74 frames. ], batch size: 56, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:21:43,692 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 18:21:51,687 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-20 18:21:54,764 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2024-08-20 18:22:10,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4903290.0, ans=0.125 2024-08-20 18:22:13,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4903290.0, ans=0.125 2024-08-20 18:22:16,892 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 25 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 18:22:18,895 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 26 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 18:22:31,350 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 28 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 18:22:34,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.265e+01 2.449e+01 2.794e+01 7.955e+01, threshold=4.899e+01, percent-clipped=1.0 2024-08-20 18:22:44,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4903490.0, ans=0.0 2024-08-20 18:22:50,335 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2024-08-20 18:23:04,186 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4903590.0, ans=0.125 2024-08-20 18:23:05,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4903590.0, ans=0.1 2024-08-20 18:23:10,312 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1400, loss[loss=0.1103, beats_loss=0.01003, ecapa_loss=0.0001677, whisper_loss=0.09863, over 18937.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001348, whisper_loss=0.08941, over 3797658.00 frames. ], batch size: 77, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:23:12,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4903690.0, ans=0.125 2024-08-20 18:23:26,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4903790.0, ans=0.125 2024-08-20 18:23:34,524 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 26 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-20 18:23:38,373 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 28 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-20 18:24:26,393 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 18:24:35,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4904090.0, ans=0.2 2024-08-20 18:24:36,637 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 18:24:38,117 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1450, loss[loss=0.1086, beats_loss=0.009745, ecapa_loss=0.0001347, whisper_loss=0.09751, over 22215.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001349, whisper_loss=0.08952, over 3802195.72 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:24:44,715 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-20 18:24:55,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4904290.0, ans=0.0 2024-08-20 18:25:04,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4904290.0, ans=0.125 2024-08-20 18:25:07,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4904290.0, ans=0.125 2024-08-20 18:25:14,126 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 18:25:14,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4904390.0, ans=0.0 2024-08-20 18:25:21,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4904390.0, ans=0.1 2024-08-20 18:25:27,474 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-20 18:25:32,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.130e+01 2.447e+01 2.778e+01 4.776e+01, threshold=4.894e+01, percent-clipped=0.0 2024-08-20 18:26:20,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4904590.0, ans=0.125 2024-08-20 18:26:22,014 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 18:26:22,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4904590.0, ans=0.125 2024-08-20 18:26:27,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4904590.0, ans=0.2 2024-08-20 18:26:32,484 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1500, loss[loss=0.08359, beats_loss=0.0125, ecapa_loss=0.0001179, whisper_loss=0.06991, over 19495.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01029, ecapa_loss=0.0001352, whisper_loss=0.08908, over 3803365.69 frames. ], batch size: 77, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:26:46,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4904690.0, ans=0.125 2024-08-20 18:26:57,409 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 18:27:01,104 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4904790.0, ans=0.125 2024-08-20 18:27:08,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4904890.0, ans=0.125 2024-08-20 18:27:10,210 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 18:27:14,189 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 18:27:21,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4904890.0, ans=0.0 2024-08-20 18:27:26,468 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 18:27:42,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4904990.0, ans=0.025 2024-08-20 18:27:51,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4905090.0, ans=0.2 2024-08-20 18:27:56,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4905090.0, ans=0.2 2024-08-20 18:28:00,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4905090.0, ans=0.125 2024-08-20 18:28:04,913 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1550, loss[loss=0.1157, beats_loss=0.00888, ecapa_loss=0.0001397, whisper_loss=0.1054, over 23528.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001359, whisper_loss=0.08874, over 3778629.96 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:28:14,457 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-20 18:28:19,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4905190.0, ans=0.2 2024-08-20 18:28:19,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4905190.0, ans=0.125 2024-08-20 18:28:23,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4905290.0, ans=0.125 2024-08-20 18:28:41,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=8.0 2024-08-20 18:28:44,018 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 18:28:58,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-20 18:28:59,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4905490.0, ans=0.0 2024-08-20 18:29:01,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.222e+01 2.378e+01 2.673e+01 8.948e+01, threshold=4.757e+01, percent-clipped=1.0 2024-08-20 18:29:01,261 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 29 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 18:29:14,923 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 18:29:33,047 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 15 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 18:29:38,724 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1600, loss[loss=0.1284, beats_loss=0.008108, ecapa_loss=0.0001343, whisper_loss=0.1189, over 23443.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001364, whisper_loss=0.08895, over 3747167.19 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:29:43,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4905690.0, ans=0.125 2024-08-20 18:30:25,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4905890.0, ans=0.2 2024-08-20 18:30:33,981 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0239420123398304, model_norm_threshold=47.56806564331055 2024-08-20 18:30:34,149 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.639e+05, grad_sumsq=5.639e+05, orig_rms_sq=1.000e+00 2024-08-20 18:30:48,350 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.160e+05 2024-08-20 18:30:57,147 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 18:31:02,016 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05985227972269058, model_norm_threshold=47.56806564331055 2024-08-20 18:31:02,184 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.784e+04, grad_sumsq=6.784e+04, orig_rms_sq=1.000e+00 2024-08-20 18:31:02,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4906090.0, ans=0.125 2024-08-20 18:31:10,632 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1650, loss[loss=0.08536, beats_loss=0.01149, ecapa_loss=0.0001312, whisper_loss=0.07255, over 14225.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001361, whisper_loss=0.08955, over 3757778.98 frames. ], batch size: 57, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:31:43,956 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4906290.0, ans=0.1 2024-08-20 18:31:56,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-20 18:32:04,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.417e+01 2.742e+01 3.192e+01 1.987e+03, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 18:32:14,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4906490.0, ans=0.125 2024-08-20 18:32:17,223 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 16 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 18:32:24,313 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 18:32:27,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4906590.0, ans=0.0 2024-08-20 18:32:30,071 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-08-20 18:32:35,017 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4906590.0, ans=0.125 2024-08-20 18:32:39,582 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1700, loss[loss=0.07863, beats_loss=0.01302, ecapa_loss=9.98e-05, whisper_loss=0.06461, over 20412.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001361, whisper_loss=0.08975, over 3772971.32 frames. ], batch size: 78, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:32:45,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4906690.0, ans=0.0 2024-08-20 18:32:50,992 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 18:32:59,125 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-20 18:33:10,167 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4906790.0, ans=0.04949747468305833 2024-08-20 18:33:17,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4906890.0, ans=0.125 2024-08-20 18:33:18,032 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-20 18:33:23,808 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-20 18:33:41,572 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4906990.0, ans=0.125 2024-08-20 18:33:43,031 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 18:33:53,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4907090.0, ans=0.05 2024-08-20 18:33:57,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-20 18:34:05,571 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.81 vs. limit=6.0 2024-08-20 18:34:11,704 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1750, loss[loss=0.08785, beats_loss=0.009303, ecapa_loss=0.0001741, whisper_loss=0.0768, over 13622.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001365, whisper_loss=0.0899, over 3769067.30 frames. ], batch size: 54, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:34:16,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-20 18:34:23,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-20 18:34:25,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4907190.0, ans=0.0 2024-08-20 18:34:30,216 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 18:34:52,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4907390.0, ans=0.125 2024-08-20 18:35:02,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4907390.0, ans=0.125 2024-08-20 18:35:02,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4907390.0, ans=0.0 2024-08-20 18:35:05,589 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.254e+01 2.590e+01 2.933e+01 3.656e+02, threshold=5.181e+01, percent-clipped=1.0 2024-08-20 18:35:40,830 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1800, loss[loss=0.09828, beats_loss=0.01103, ecapa_loss=0.0001436, whisper_loss=0.08582, over 22337.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001367, whisper_loss=0.08926, over 3770065.39 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:35:51,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4907690.0, ans=0.125 2024-08-20 18:35:54,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4907690.0, ans=0.125 2024-08-20 18:36:10,304 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 18:36:26,466 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-20 18:36:26,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4907890.0, ans=0.2 2024-08-20 18:37:00,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=22.5 2024-08-20 18:37:01,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4908090.0, ans=0.125 2024-08-20 18:37:04,941 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4908090.0, ans=0.125 2024-08-20 18:37:09,342 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1850, loss[loss=0.09059, beats_loss=0.01174, ecapa_loss=0.0001221, whisper_loss=0.07763, over 17351.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01033, ecapa_loss=0.0001361, whisper_loss=0.08848, over 3731524.08 frames. ], batch size: 68, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:37:22,105 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 18:37:30,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-20 18:37:39,915 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 33 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 18:37:47,470 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 18:38:03,690 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 18:38:04,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.237e+01 2.438e+01 2.771e+01 3.802e+01, threshold=4.876e+01, percent-clipped=0.0 2024-08-20 18:38:13,324 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 18:38:15,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4908490.0, ans=0.2 2024-08-20 18:38:15,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4908490.0, ans=0.125 2024-08-20 18:38:30,249 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 18:38:39,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.66 vs. limit=22.5 2024-08-20 18:38:43,212 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1900, loss[loss=0.08202, beats_loss=0.006644, ecapa_loss=0.0001834, whisper_loss=0.07354, over 13060.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01027, ecapa_loss=0.000136, whisper_loss=0.08847, over 3746016.04 frames. ], batch size: 49, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:39:06,469 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 18:39:08,364 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4908790.0, ans=0.125 2024-08-20 18:39:18,092 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.787e-01 2024-08-20 18:40:18,054 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 1950, loss[loss=0.09682, beats_loss=0.007967, ecapa_loss=0.0001659, whisper_loss=0.08719, over 18160.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01031, ecapa_loss=0.0001363, whisper_loss=0.08829, over 3764430.11 frames. ], batch size: 74, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:40:22,531 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 18:40:24,307 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 18:40:37,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4909290.0, ans=0.125 2024-08-20 18:40:48,153 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 18:41:00,664 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2024-08-20 18:41:14,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.290e+01 2.558e+01 2.755e+01 1.117e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-20 18:41:31,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4909590.0, ans=0.1 2024-08-20 18:41:44,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4909590.0, ans=0.2 2024-08-20 18:41:50,636 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2000, loss[loss=0.09248, beats_loss=0.00937, ecapa_loss=0.0001133, whisper_loss=0.08198, over 14410.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01028, ecapa_loss=0.000135, whisper_loss=0.08892, over 3756409.45 frames. ], batch size: 53, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:41:51,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4909690.0, ans=0.0 2024-08-20 18:41:51,580 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2024-08-20 18:42:04,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4909690.0, ans=0.1 2024-08-20 18:42:11,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4909790.0, ans=0.125 2024-08-20 18:42:34,928 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4909890.0, ans=0.125 2024-08-20 18:43:01,331 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 18:43:10,303 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 18:43:14,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4910090.0, ans=0.125 2024-08-20 18:43:20,489 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2050, loss[loss=0.1019, beats_loss=0.009589, ecapa_loss=0.0001271, whisper_loss=0.091, over 18395.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01029, ecapa_loss=0.0001344, whisper_loss=0.08961, over 3774404.20 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:43:26,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4910190.0, ans=0.1 2024-08-20 18:43:26,412 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2024-08-20 18:43:30,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4910190.0, ans=0.09899494936611666 2024-08-20 18:43:43,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4910290.0, ans=0.07 2024-08-20 18:43:48,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4910290.0, ans=0.2 2024-08-20 18:43:51,530 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 18:43:52,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4910290.0, ans=0.125 2024-08-20 18:44:10,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4910390.0, ans=0.125 2024-08-20 18:44:13,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4910490.0, ans=0.125 2024-08-20 18:44:14,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.216e+01 2.451e+01 2.843e+01 3.787e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-20 18:44:18,935 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:44:19,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4910490.0, ans=0.1 2024-08-20 18:44:33,185 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 12 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 18:44:40,318 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 18:44:43,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=4910590.0, ans=12.0 2024-08-20 18:44:46,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4910590.0, ans=0.125 2024-08-20 18:44:49,002 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2100, loss[loss=0.08167, beats_loss=0.01123, ecapa_loss=0.0001146, whisper_loss=0.0693, over 18059.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01034, ecapa_loss=0.0001349, whisper_loss=0.0892, over 3754276.53 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:44:51,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4910690.0, ans=0.125 2024-08-20 18:45:00,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4910690.0, ans=0.125 2024-08-20 18:45:14,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4910790.0, ans=0.2 2024-08-20 18:45:20,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4910790.0, ans=0.0 2024-08-20 18:45:25,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-20 18:45:37,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4910890.0, ans=0.0 2024-08-20 18:46:16,845 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 32 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 18:46:17,947 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2150, loss[loss=0.1379, beats_loss=0.007397, ecapa_loss=0.0001276, whisper_loss=0.1292, over 19301.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001355, whisper_loss=0.08943, over 3765380.05 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:46:29,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-20 18:47:01,558 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-20 18:47:04,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4911390.0, ans=0.125 2024-08-20 18:47:09,446 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05905143544077873, model_norm_threshold=49.024410247802734 2024-08-20 18:47:09,616 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.632e+04, grad_sumsq=6.177e+06, orig_rms_sq=1.074e-02 2024-08-20 18:47:11,639 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 18:47:12,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.265e+01 2.540e+01 2.946e+01 8.302e+02, threshold=5.079e+01, percent-clipped=3.0 2024-08-20 18:47:23,075 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 18:47:46,285 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2200, loss[loss=0.09398, beats_loss=0.01097, ecapa_loss=0.0001003, whisper_loss=0.08201, over 15392.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01028, ecapa_loss=0.0001369, whisper_loss=0.08925, over 3768123.94 frames. ], batch size: 57, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:47:49,618 WARNING [optim.py:496] (2/4) Scaling gradients by 0.06644881516695023, model_norm_threshold=50.791358947753906 2024-08-20 18:47:49,787 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.846e+04, grad_sumsq=7.846e+04, orig_rms_sq=1.000e+00 2024-08-20 18:47:55,453 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 18:48:12,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4911790.0, ans=0.035 2024-08-20 18:48:16,811 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-20 18:48:19,013 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 18:48:28,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4911890.0, ans=0.0 2024-08-20 18:49:17,688 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2250, loss[loss=0.1202, beats_loss=0.009056, ecapa_loss=0.0001379, whisper_loss=0.1098, over 19643.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.0001363, whisper_loss=0.08937, over 3765243.25 frames. ], batch size: 78, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:49:26,376 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 18:49:28,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4912190.0, ans=0.1 2024-08-20 18:49:57,717 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 18:50:03,434 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 17 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 18:50:14,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.207e+01 2.453e+01 2.665e+01 7.644e+02, threshold=4.907e+01, percent-clipped=1.0 2024-08-20 18:50:14,982 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.316e+01 2024-08-20 18:50:19,665 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 18:50:28,233 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 18:50:35,488 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 18:50:43,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4912590.0, ans=0.125 2024-08-20 18:50:48,187 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2300, loss[loss=0.1054, beats_loss=0.01057, ecapa_loss=0.0001238, whisper_loss=0.09363, over 16708.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.000136, whisper_loss=0.08928, over 3786558.70 frames. ], batch size: 66, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:51:02,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4912690.0, ans=0.125 2024-08-20 18:51:03,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4912690.0, ans=0.125 2024-08-20 18:51:10,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=22.5 2024-08-20 18:51:15,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-20 18:51:17,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4912790.0, ans=0.0 2024-08-20 18:51:36,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4912890.0, ans=0.1 2024-08-20 18:51:36,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4912890.0, ans=0.04949747468305833 2024-08-20 18:51:49,181 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 18:51:59,128 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 26 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 18:52:08,489 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.61 vs. limit=22.5 2024-08-20 18:52:17,059 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2350, loss[loss=0.1284, beats_loss=0.006629, ecapa_loss=0.0001623, whisper_loss=0.1201, over 17372.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.08978, over 3810867.09 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:52:26,091 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 38 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 18:52:26,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4913190.0, ans=0.0 2024-08-20 18:52:28,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4913190.0, ans=0.125 2024-08-20 18:52:28,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-08-20 18:52:38,711 WARNING [optim.py:496] (2/4) Scaling gradients by 0.04937309771776199, model_norm_threshold=49.067115783691406 2024-08-20 18:52:38,879 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.547e+05, grad_sumsq=4.707e+04, orig_rms_sq=3.286e+00 2024-08-20 18:52:48,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4913290.0, ans=0.125 2024-08-20 18:53:14,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.360e+01 2.620e+01 2.900e+01 9.938e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-20 18:53:43,136 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 18:53:49,349 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2400, loss[loss=0.08694, beats_loss=0.01269, ecapa_loss=0.0001443, whisper_loss=0.0728, over 22218.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001366, whisper_loss=0.08964, over 3816018.14 frames. ], batch size: 94, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:53:51,026 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 18:53:57,981 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 18:54:08,511 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4913790.0, ans=0.035 2024-08-20 18:54:08,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4913790.0, ans=0.125 2024-08-20 18:54:13,983 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-20 18:54:40,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-08-20 18:54:47,602 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 18:55:02,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4914090.0, ans=0.2 2024-08-20 18:55:19,318 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2450, loss[loss=0.1017, beats_loss=0.009903, ecapa_loss=0.0001357, whisper_loss=0.09039, over 16579.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001363, whisper_loss=0.08945, over 3785791.87 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:55:43,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4914290.0, ans=0.125 2024-08-20 18:55:48,390 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 18:56:16,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.269e+01 2.495e+01 2.810e+01 4.376e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 18:56:18,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4914490.0, ans=0.0 2024-08-20 18:56:31,909 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-20 18:56:36,617 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4914590.0, ans=0.125 2024-08-20 18:56:53,405 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2500, loss[loss=0.1048, beats_loss=0.009485, ecapa_loss=0.000148, whisper_loss=0.09385, over 22302.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001355, whisper_loss=0.0896, over 3795409.67 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:56:58,370 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 18:57:06,540 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 18:57:15,930 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4914790.0, ans=0.07 2024-08-20 18:57:56,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4914990.0, ans=0.125 2024-08-20 18:58:19,738 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 18:58:21,727 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2550, loss[loss=0.1209, beats_loss=0.008676, ecapa_loss=0.0001411, whisper_loss=0.1108, over 22958.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001354, whisper_loss=0.09061, over 3799589.92 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:58:27,076 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 18:58:29,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4915190.0, ans=0.025 2024-08-20 18:58:29,573 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-20 18:58:40,254 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4915290.0, ans=0.125 2024-08-20 18:59:00,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4915390.0, ans=15.0 2024-08-20 18:59:19,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.358e+01 2.574e+01 2.752e+01 5.119e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-20 18:59:55,385 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2600, loss[loss=0.07681, beats_loss=0.01299, ecapa_loss=0.000113, whisper_loss=0.06268, over 18063.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001352, whisper_loss=0.09036, over 3813973.00 frames. ], batch size: 74, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:59:59,099 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 25 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 19:00:03,134 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 19:00:05,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4915690.0, ans=0.125 2024-08-20 19:00:07,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4915690.0, ans=0.2 2024-08-20 19:00:07,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4915690.0, ans=0.125 2024-08-20 19:00:46,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4915890.0, ans=0.0 2024-08-20 19:00:47,782 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 19:00:55,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4915990.0, ans=0.0 2024-08-20 19:01:15,898 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-20 19:01:16,027 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=12.0 2024-08-20 19:01:30,874 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2650, loss[loss=0.1178, beats_loss=0.01084, ecapa_loss=0.000116, whisper_loss=0.1058, over 19455.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.000136, whisper_loss=0.09036, over 3825954.22 frames. ], batch size: 75, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:01:32,745 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 19:01:41,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4916190.0, ans=0.125 2024-08-20 19:01:58,444 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.06 vs. limit=22.5 2024-08-20 19:02:02,525 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0383872389793396, model_norm_threshold=51.48301696777344 2024-08-20 19:02:02,695 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.236e+05, grad_sumsq=2.236e+05, orig_rms_sq=1.000e+00 2024-08-20 19:02:03,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4916290.0, ans=0.0 2024-08-20 19:02:06,934 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 19:02:07,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4916390.0, ans=0.1 2024-08-20 19:02:13,402 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 19:02:25,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.307e+01 2.525e+01 3.012e+01 1.341e+03, threshold=5.051e+01, percent-clipped=2.0 2024-08-20 19:02:26,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4916490.0, ans=0.0 2024-08-20 19:02:31,336 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 19:02:42,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4916590.0, ans=0.125 2024-08-20 19:02:45,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4916590.0, ans=0.95 2024-08-20 19:02:51,202 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.487e+00 2024-08-20 19:02:59,059 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2700, loss[loss=0.09999, beats_loss=0.01129, ecapa_loss=0.0001414, whisper_loss=0.08728, over 23128.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001363, whisper_loss=0.09019, over 3819547.02 frames. ], batch size: 95, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:03:17,318 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.837e+00 2024-08-20 19:03:50,236 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 20 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-20 19:03:58,108 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-08-20 19:04:27,976 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2750, loss[loss=0.09329, beats_loss=0.008059, ecapa_loss=0.0001451, whisper_loss=0.08378, over 16369.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001363, whisper_loss=0.0899, over 3826000.73 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:04:30,786 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4917190.0, ans=0.125 2024-08-20 19:04:37,033 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 19:04:41,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4917190.0, ans=0.125 2024-08-20 19:05:14,916 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 19:05:16,716 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 19:05:18,989 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-20 19:05:22,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4917390.0, ans=10.0 2024-08-20 19:05:27,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4917490.0, ans=0.125 2024-08-20 19:05:28,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.332e+01 2.555e+01 2.897e+01 4.432e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 19:05:34,273 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-20 19:05:36,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-20 19:06:01,422 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 19:06:04,501 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2800, loss[loss=0.07831, beats_loss=0.01336, ecapa_loss=0.0001289, whisper_loss=0.06366, over 22881.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001362, whisper_loss=0.0892, over 3818774.13 frames. ], batch size: 95, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:06:30,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=12.0 2024-08-20 19:06:36,924 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4917790.0, ans=0.125 2024-08-20 19:06:38,900 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-20 19:06:44,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4917890.0, ans=0.125 2024-08-20 19:07:23,283 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-20 19:07:32,814 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2850, loss[loss=0.09227, beats_loss=0.01165, ecapa_loss=0.0001443, whisper_loss=0.07918, over 21573.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001362, whisper_loss=0.0892, over 3783148.63 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:07:50,762 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 19:08:07,160 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.44 vs. limit=22.5 2024-08-20 19:08:13,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4918390.0, ans=0.125 2024-08-20 19:08:13,688 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-20 19:08:17,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4918390.0, ans=0.1 2024-08-20 19:08:20,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4918390.0, ans=0.125 2024-08-20 19:08:22,230 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 19:08:29,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.343e+01 2.572e+01 2.859e+01 3.545e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-20 19:08:29,628 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 15 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 19:08:44,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4918590.0, ans=0.125 2024-08-20 19:08:47,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4918590.0, ans=0.0 2024-08-20 19:08:52,073 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=22.5 2024-08-20 19:08:53,411 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 19:09:03,474 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2900, loss[loss=0.1311, beats_loss=0.005859, ecapa_loss=0.0001547, whisper_loss=0.1237, over 14851.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001369, whisper_loss=0.08953, over 3774523.96 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:09:05,399 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 19:09:10,538 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 19:09:12,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4918690.0, ans=0.035 2024-08-20 19:09:14,786 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 19:09:20,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:22,014 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 19:09:26,972 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 19:09:46,867 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 17 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 19:09:54,252 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:09:56,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-20 19:10:02,209 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 19:10:09,231 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 19:10:29,530 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 19:10:32,716 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 2950, loss[loss=0.08711, beats_loss=0.01199, ecapa_loss=0.0001186, whisper_loss=0.07393, over 23453.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001381, whisper_loss=0.08968, over 3769813.37 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:10:38,651 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:10:59,161 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 19:11:03,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-08-20 19:11:04,907 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4919290.0, ans=0.125 2024-08-20 19:11:30,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.232e+01 2.550e+01 2.898e+01 2.799e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-20 19:11:53,343 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 19:11:53,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4919590.0, ans=0.04949747468305833 2024-08-20 19:11:58,786 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 19:12:02,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4919590.0, ans=0.0 2024-08-20 19:12:05,893 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3000, loss[loss=0.08488, beats_loss=0.008268, ecapa_loss=0.0001683, whisper_loss=0.07493, over 13524.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001385, whisper_loss=0.09059, over 3794746.64 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:12:05,894 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 19:12:42,484 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000513, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 19:13:06,225 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on SV_voxceleb1: loss=0.003961, beats_loss=0, ecapa_loss=0.0003961, whisper_loss=0, over 944235.00 frames. 2024-08-20 19:13:40,665 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0495, 2.2948, 2.4409, 2.1247], device='cuda:2') 2024-08-20 19:14:44,920 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 19:14:44,923 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 19:15:00,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4919790.0, ans=10.0 2024-08-20 19:15:08,035 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-20 19:15:23,868 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 19:15:35,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4919890.0, ans=0.05 2024-08-20 19:15:44,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4919990.0, ans=0.125 2024-08-20 19:15:56,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4920090.0, ans=0.0 2024-08-20 19:16:04,402 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 19:16:11,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4920090.0, ans=0.09899494936611666 2024-08-20 19:16:15,032 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3050, loss[loss=0.08846, beats_loss=0.01213, ecapa_loss=0.0001448, whisper_loss=0.07488, over 15813.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001388, whisper_loss=0.09034, over 3794808.96 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:16:36,155 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 19:16:48,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4920390.0, ans=0.1 2024-08-20 19:17:08,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4920490.0, ans=0.0 2024-08-20 19:17:09,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.284e+01 2.588e+01 2.897e+01 2.080e+02, threshold=5.176e+01, percent-clipped=1.0 2024-08-20 19:17:13,339 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:17:38,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4920590.0, ans=0.125 2024-08-20 19:17:41,207 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3100, loss[loss=0.114, beats_loss=0.0103, ecapa_loss=0.0001572, whisper_loss=0.1021, over 19082.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.08947, over 3771879.52 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:17:49,756 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 19:17:53,498 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 19:17:53,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4920690.0, ans=0.0 2024-08-20 19:17:55,749 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=15.0 2024-08-20 19:18:04,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4920790.0, ans=0.0 2024-08-20 19:18:07,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4920790.0, ans=0.025 2024-08-20 19:18:11,178 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 19:18:13,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4920790.0, ans=0.0 2024-08-20 19:18:23,721 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 19:18:27,655 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 19:19:02,206 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 19:19:05,007 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.112e+01 2024-08-20 19:19:11,051 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3150, loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.09089, over 21873.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09012, over 3784954.47 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:19:18,858 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 19:19:22,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4921190.0, ans=0.0 2024-08-20 19:19:49,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4921390.0, ans=0.0 2024-08-20 19:19:49,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4921390.0, ans=0.125 2024-08-20 19:19:56,325 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 19:20:06,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.207e+01 2.457e+01 2.685e+01 3.583e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-20 19:20:06,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4921490.0, ans=0.1 2024-08-20 19:20:20,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4921590.0, ans=0.125 2024-08-20 19:20:35,457 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4921590.0, ans=0.125 2024-08-20 19:20:38,069 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3200, loss[loss=0.1145, beats_loss=0.01125, ecapa_loss=0.0001159, whisper_loss=0.1021, over 16276.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001401, whisper_loss=0.08962, over 3771777.14 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:20:40,399 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 17 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 19:20:42,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4921690.0, ans=0.125 2024-08-20 19:20:58,723 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.97 vs. limit=22.5 2024-08-20 19:21:04,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4921790.0, ans=0.125 2024-08-20 19:21:11,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4921890.0, ans=0.0 2024-08-20 19:21:18,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4921890.0, ans=0.0 2024-08-20 19:21:31,174 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 19:21:33,471 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 19:21:43,468 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 19:21:57,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-20 19:22:03,076 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3250, loss[loss=0.1031, beats_loss=0.008367, ecapa_loss=0.0001501, whisper_loss=0.0932, over 14067.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.08951, over 3785951.84 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:22:22,966 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4922290.0, ans=0.1 2024-08-20 19:22:24,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4922290.0, ans=0.125 2024-08-20 19:22:48,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-20 19:22:56,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.200e+01 2.511e+01 2.776e+01 3.425e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 19:23:02,200 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4922490.0, ans=0.1 2024-08-20 19:23:22,190 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 11 from LS+wenet, 8 from Vox, 37 fro AS 2024-08-20 19:23:28,480 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3300, loss[loss=0.09796, beats_loss=0.009066, ecapa_loss=0.0001239, whisper_loss=0.08766, over 17367.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.09055, over 3791722.52 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:23:29,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-20 19:23:57,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4922790.0, ans=0.1 2024-08-20 19:24:20,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4922990.0, ans=0.0 2024-08-20 19:24:22,327 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2024-08-20 19:24:23,636 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-20 19:24:54,546 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 19:24:55,514 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3350, loss[loss=0.1014, beats_loss=0.009139, ecapa_loss=0.0001449, whisper_loss=0.09086, over 19037.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.09048, over 3779349.04 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:25:03,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4923190.0, ans=0.0 2024-08-20 19:25:06,565 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 19:25:09,238 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-08-20 19:25:11,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4923290.0, ans=0.125 2024-08-20 19:25:13,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4923290.0, ans=0.1 2024-08-20 19:25:42,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4923390.0, ans=0.0 2024-08-20 19:25:45,585 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 19:25:49,367 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.220e+01 2.419e+01 2.738e+01 3.918e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-20 19:25:56,648 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:25:58,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4923490.0, ans=0.035 2024-08-20 19:26:08,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4923590.0, ans=0.125 2024-08-20 19:26:15,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4923590.0, ans=0.125 2024-08-20 19:26:21,893 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3400, loss[loss=0.08707, beats_loss=0.01082, ecapa_loss=0.0001338, whisper_loss=0.07491, over 17203.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.09053, over 3750127.75 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:26:33,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4923690.0, ans=0.1 2024-08-20 19:26:44,985 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 19:26:50,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4923790.0, ans=0.0 2024-08-20 19:27:06,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4923890.0, ans=0.0 2024-08-20 19:27:16,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4923990.0, ans=0.2 2024-08-20 19:27:16,809 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4923990.0, ans=0.125 2024-08-20 19:27:48,626 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3450, loss[loss=0.1205, beats_loss=0.009145, ecapa_loss=0.0001181, whisper_loss=0.1102, over 24303.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01024, ecapa_loss=0.0001414, whisper_loss=0.09142, over 3800562.51 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:28:13,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4924290.0, ans=0.125 2024-08-20 19:28:15,868 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 19:28:24,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4924390.0, ans=0.0 2024-08-20 19:28:42,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.395e+01 2.734e+01 3.067e+01 2.505e+02, threshold=5.467e+01, percent-clipped=4.0 2024-08-20 19:28:45,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4924490.0, ans=0.125 2024-08-20 19:28:48,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4924490.0, ans=0.125 2024-08-20 19:28:51,320 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 19:29:08,778 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 19:29:12,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-20 19:29:15,166 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3500, loss[loss=0.08951, beats_loss=0.01392, ecapa_loss=0.0001091, whisper_loss=0.07451, over 23132.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.09065, over 3783452.68 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:29:20,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4924690.0, ans=0.0 2024-08-20 19:29:23,552 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 19:29:24,933 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 19:29:34,325 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 19:29:43,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4924790.0, ans=0.125 2024-08-20 19:29:54,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4924890.0, ans=0.1 2024-08-20 19:29:55,787 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 19:30:04,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4924890.0, ans=0.125 2024-08-20 19:30:18,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4924990.0, ans=0.1 2024-08-20 19:30:31,851 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 29 from Vox, 13 fro AS 2024-08-20 19:30:41,017 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 19:30:42,351 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3550, loss[loss=0.112, beats_loss=0.009825, ecapa_loss=0.0001364, whisper_loss=0.1008, over 20713.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0103, ecapa_loss=0.0001408, whisper_loss=0.09044, over 3756163.42 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:30:56,847 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 19:30:57,578 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-20 19:31:09,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-08-20 19:31:36,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.289e+01 2.465e+01 2.729e+01 3.504e+01, threshold=4.930e+01, percent-clipped=0.0 2024-08-20 19:31:54,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4925590.0, ans=0.125 2024-08-20 19:32:04,669 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 19:32:04,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4925590.0, ans=0.0 2024-08-20 19:32:07,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4925690.0, ans=0.125 2024-08-20 19:32:09,078 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3600, loss[loss=0.11, beats_loss=0.01043, ecapa_loss=0.000122, whisper_loss=0.09833, over 14929.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.08932, over 3768070.69 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:32:14,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4925690.0, ans=0.125 2024-08-20 19:33:00,754 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-20 19:33:15,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4926090.0, ans=0.125 2024-08-20 19:33:17,459 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 19:33:34,576 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3650, loss[loss=0.1074, beats_loss=0.01017, ecapa_loss=0.0001458, whisper_loss=0.09574, over 21798.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.08923, over 3795966.37 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:33:36,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4926190.0, ans=0.125 2024-08-20 19:33:40,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4926190.0, ans=0.95 2024-08-20 19:33:40,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-20 19:33:54,513 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 30 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-20 19:34:09,265 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4926390.0, ans=0.125 2024-08-20 19:34:17,331 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 12 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 19:34:25,231 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 19:34:28,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.198e+01 2.421e+01 2.738e+01 4.465e+02, threshold=4.843e+01, percent-clipped=1.0 2024-08-20 19:35:01,488 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3700, loss[loss=0.1066, beats_loss=0.008032, ecapa_loss=0.0001777, whisper_loss=0.09679, over 12579.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.08993, over 3755269.46 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:35:15,431 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4926690.0, ans=0.04949747468305833 2024-08-20 19:35:28,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-20 19:35:29,922 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4926790.0, ans=0.125 2024-08-20 19:35:35,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-20 19:35:38,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4926890.0, ans=0.125 2024-08-20 19:35:50,498 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:35:52,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4926890.0, ans=0.125 2024-08-20 19:35:59,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4926990.0, ans=0.125 2024-08-20 19:36:02,597 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-20 19:36:11,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4927090.0, ans=0.0 2024-08-20 19:36:14,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4927090.0, ans=0.125 2024-08-20 19:36:17,593 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 19:36:17,889 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4927090.0, ans=0.125 2024-08-20 19:36:29,328 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3750, loss[loss=0.09498, beats_loss=0.01153, ecapa_loss=0.0001691, whisper_loss=0.08176, over 16302.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.09008, over 3760818.91 frames. ], batch size: 70, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:36:38,284 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 19:36:38,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4927190.0, ans=0.125 2024-08-20 19:37:01,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4927290.0, ans=0.0 2024-08-20 19:37:22,077 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4927490.0, ans=0.125 2024-08-20 19:37:22,990 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.233e+01 2.505e+01 2.774e+01 5.527e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-20 19:37:23,247 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 19:37:31,262 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 20 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 19:37:43,980 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4927590.0, ans=0.0 2024-08-20 19:37:44,010 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4927590.0, ans=0.0 2024-08-20 19:37:55,089 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3800, loss[loss=0.1025, beats_loss=0.01115, ecapa_loss=0.0001266, whisper_loss=0.09011, over 23147.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.08984, over 3772164.27 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:38:13,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4927790.0, ans=0.0 2024-08-20 19:38:30,012 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4927890.0, ans=0.0 2024-08-20 19:38:33,494 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4927890.0, ans=0.125 2024-08-20 19:38:49,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4927990.0, ans=0.0 2024-08-20 19:39:01,205 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 19:39:10,416 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:39:21,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4928190.0, ans=0.1 2024-08-20 19:39:21,975 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3850, loss[loss=0.09758, beats_loss=0.008499, ecapa_loss=0.0001564, whisper_loss=0.08752, over 20105.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001411, whisper_loss=0.08951, over 3777585.86 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:39:34,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4928190.0, ans=0.125 2024-08-20 19:39:39,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4928290.0, ans=0.125 2024-08-20 19:39:46,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4928290.0, ans=0.05 2024-08-20 19:40:00,395 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 19:40:16,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.370e+01 2.629e+01 2.963e+01 4.700e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-20 19:40:18,807 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 19:40:19,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-08-20 19:40:22,483 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 19:40:24,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4928490.0, ans=0.125 2024-08-20 19:40:32,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4928590.0, ans=0.2 2024-08-20 19:40:38,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2024-08-20 19:40:49,456 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 19:40:50,864 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3900, loss[loss=0.0861, beats_loss=0.01211, ecapa_loss=0.0001562, whisper_loss=0.07242, over 20350.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001414, whisper_loss=0.08922, over 3783245.38 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:40:58,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=12.0 2024-08-20 19:41:17,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4928790.0, ans=0.0 2024-08-20 19:41:20,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4928790.0, ans=0.125 2024-08-20 19:41:29,911 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4928890.0, ans=0.125 2024-08-20 19:41:33,196 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-08-20 19:41:45,790 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 19:42:00,232 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 19:42:16,735 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 3950, loss[loss=0.1074, beats_loss=0.01014, ecapa_loss=0.0001398, whisper_loss=0.09587, over 22292.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.08914, over 3781339.66 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:42:20,577 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 19:42:24,140 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 19:42:48,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4929290.0, ans=0.2 2024-08-20 19:43:11,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.625e+01 2.908e+01 3.824e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-20 19:43:13,436 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-20 19:43:17,077 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 19:43:21,216 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 19:43:26,429 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.689e+00 2024-08-20 19:43:44,292 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4000, loss[loss=0.09028, beats_loss=0.01304, ecapa_loss=0.0001589, whisper_loss=0.07565, over 21472.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.08977, over 3848637.66 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:43:50,051 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 19:44:00,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4929790.0, ans=0.0 2024-08-20 19:44:39,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=4929990.0, ans=0.02 2024-08-20 19:44:42,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4929990.0, ans=0.0 2024-08-20 19:44:56,886 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 25 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 19:45:03,980 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 19:45:14,026 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4050, loss[loss=0.09744, beats_loss=0.01022, ecapa_loss=0.0001241, whisper_loss=0.08598, over 19006.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.000141, whisper_loss=0.09, over 3852805.25 frames. ], batch size: 74, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:45:25,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4930190.0, ans=0.1 2024-08-20 19:45:26,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4930190.0, ans=0.125 2024-08-20 19:45:45,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4930290.0, ans=0.125 2024-08-20 19:46:10,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4930490.0, ans=0.2 2024-08-20 19:46:11,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.272e+01 2.496e+01 2.748e+01 3.675e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-20 19:46:19,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-20 19:46:24,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4930590.0, ans=0.125 2024-08-20 19:46:37,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4930590.0, ans=0.0 2024-08-20 19:46:44,590 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4100, loss[loss=0.1058, beats_loss=0.009814, ecapa_loss=0.0001835, whisper_loss=0.0942, over 20813.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08936, over 3850282.00 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:46:45,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4930690.0, ans=0.2 2024-08-20 19:46:50,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4930690.0, ans=0.2 2024-08-20 19:46:50,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4930690.0, ans=0.0 2024-08-20 19:46:50,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4930690.0, ans=0.0 2024-08-20 19:46:53,460 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 19:47:07,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4930790.0, ans=0.125 2024-08-20 19:47:12,854 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 19:47:23,887 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4930890.0, ans=0.1 2024-08-20 19:47:25,284 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 12 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 19:47:27,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4930890.0, ans=0.125 2024-08-20 19:47:36,170 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 19:47:52,109 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 19:48:04,660 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 19:48:12,586 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4150, loss[loss=0.1146, beats_loss=0.008375, ecapa_loss=0.0001754, whisper_loss=0.1044, over 17021.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08921, over 3828045.35 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:48:21,876 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 28 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 19:48:23,125 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 19:48:25,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4931190.0, ans=0.125 2024-08-20 19:48:40,038 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 19:49:08,665 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 16 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-20 19:49:09,967 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.357e+01 2.563e+01 2.804e+01 4.051e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-20 19:49:24,523 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 20 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-20 19:49:42,269 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4200, loss[loss=0.1023, beats_loss=0.01249, ecapa_loss=0.0001574, whisper_loss=0.08826, over 21394.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001399, whisper_loss=0.08937, over 3848979.87 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:49:47,968 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 19:49:51,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4931690.0, ans=0.0 2024-08-20 19:49:55,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4931690.0, ans=0.125 2024-08-20 19:49:59,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4931790.0, ans=0.125 2024-08-20 19:50:12,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4931790.0, ans=0.125 2024-08-20 19:50:17,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=12.0 2024-08-20 19:50:35,647 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-20 19:50:39,521 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4931990.0, ans=0.125 2024-08-20 19:50:42,411 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 19:50:46,138 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 19:50:53,209 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 33 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 19:50:56,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4932090.0, ans=0.015 2024-08-20 19:51:01,545 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 19:51:01,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4932090.0, ans=0.025 2024-08-20 19:51:11,295 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4250, loss[loss=0.08051, beats_loss=0.01213, ecapa_loss=0.0001307, whisper_loss=0.06707, over 18694.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.08961, over 3839383.29 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:51:17,436 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4932190.0, ans=0.125 2024-08-20 19:51:21,094 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 19:51:53,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4932390.0, ans=0.1 2024-08-20 19:52:00,824 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4932390.0, ans=0.0 2024-08-20 19:52:08,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+01 2.265e+01 2.580e+01 2.966e+01 3.429e+02, threshold=5.160e+01, percent-clipped=3.0 2024-08-20 19:52:16,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4932490.0, ans=0.125 2024-08-20 19:52:18,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4932490.0, ans=0.125 2024-08-20 19:52:21,241 WARNING [optim.py:496] (2/4) Scaling gradients by 0.022736379876732826, model_norm_threshold=51.5983772277832 2024-08-20 19:52:21,410 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.129e+05, grad_sumsq=7.129e+05, orig_rms_sq=1.000e+00 2024-08-20 19:52:25,215 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-20 19:52:26,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4932590.0, ans=0.125 2024-08-20 19:52:35,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4932590.0, ans=0.1 2024-08-20 19:52:39,737 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4300, loss[loss=0.1123, beats_loss=0.009501, ecapa_loss=0.0001339, whisper_loss=0.1015, over 22276.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001423, whisper_loss=0.08956, over 3815176.72 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:52:54,452 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 19:53:08,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-20 19:53:11,621 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 19:53:52,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4933090.0, ans=0.0 2024-08-20 19:53:56,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.19 vs. limit=10.0 2024-08-20 19:54:08,032 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4350, loss[loss=0.0975, beats_loss=0.01098, ecapa_loss=0.0001354, whisper_loss=0.08516, over 17065.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.08868, over 3783460.47 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:54:10,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4933190.0, ans=0.125 2024-08-20 19:54:22,634 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4933190.0, ans=0.125 2024-08-20 19:54:35,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4933290.0, ans=0.0 2024-08-20 19:54:39,425 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 19:54:41,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4933390.0, ans=0.1 2024-08-20 19:54:59,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4933490.0, ans=0.2 2024-08-20 19:55:04,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.347e+01 2.590e+01 2.980e+01 2.269e+03, threshold=5.180e+01, percent-clipped=1.0 2024-08-20 19:55:07,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4933490.0, ans=0.125 2024-08-20 19:55:15,947 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-20 19:55:22,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4933590.0, ans=0.0 2024-08-20 19:55:26,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2024-08-20 19:55:30,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-20 19:55:35,533 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4400, loss[loss=0.1159, beats_loss=0.01065, ecapa_loss=0.0001354, whisper_loss=0.1039, over 22046.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08922, over 3786413.68 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:55:44,182 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 19:55:54,502 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 19:56:00,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4933790.0, ans=0.0 2024-08-20 19:56:05,788 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 19:56:11,136 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 19:56:23,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4933890.0, ans=0.0 2024-08-20 19:56:41,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4933990.0, ans=0.035 2024-08-20 19:56:51,841 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 19:56:54,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4934090.0, ans=0.2 2024-08-20 19:56:58,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4934090.0, ans=0.125 2024-08-20 19:57:05,946 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4450, loss[loss=0.1022, beats_loss=0.009794, ecapa_loss=0.0001659, whisper_loss=0.09072, over 19666.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.00014, whisper_loss=0.08951, over 3791274.82 frames. ], batch size: 78, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:57:09,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4934190.0, ans=0.1 2024-08-20 19:57:35,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4934290.0, ans=0.125 2024-08-20 19:57:41,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4934390.0, ans=0.125 2024-08-20 19:58:01,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+01 2.339e+01 2.669e+01 2.965e+01 4.502e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-20 19:58:03,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4934490.0, ans=0.05 2024-08-20 19:58:24,688 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 19:58:24,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4934590.0, ans=0.0 2024-08-20 19:58:30,649 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4500, loss[loss=0.09936, beats_loss=0.01041, ecapa_loss=0.0001379, whisper_loss=0.08758, over 13407.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08896, over 3776991.88 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:58:39,640 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 17 from LS+wenet, 38 from Vox, 35 fro AS 2024-08-20 19:59:02,053 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4934790.0, ans=0.125 2024-08-20 19:59:18,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4934890.0, ans=0.125 2024-08-20 19:59:28,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4934990.0, ans=0.125 2024-08-20 19:59:43,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4935090.0, ans=0.2 2024-08-20 19:59:46,914 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 19:59:52,608 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-08-20 19:59:53,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-20 19:59:54,421 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4550, loss[loss=0.1106, beats_loss=0.01172, ecapa_loss=0.0001188, whisper_loss=0.09773, over 23964.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08929, over 3818786.65 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:00:05,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-08-20 20:00:06,574 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 20:00:14,998 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 20:00:27,056 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 20:00:28,611 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.49 vs. limit=22.5 2024-08-20 20:00:49,716 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-20 20:00:50,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.294e+01 2.455e+01 2.830e+01 3.953e+01, threshold=4.911e+01, percent-clipped=0.0 2024-08-20 20:00:52,469 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-20 20:00:56,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4935490.0, ans=0.125 2024-08-20 20:00:56,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4935490.0, ans=0.04949747468305833 2024-08-20 20:01:22,205 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4600, loss[loss=0.1024, beats_loss=0.01071, ecapa_loss=0.0001317, whisper_loss=0.09039, over 20582.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001406, whisper_loss=0.0889, over 3813912.38 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:01:41,039 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-20 20:01:50,612 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 20:01:51,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4935790.0, ans=0.0 2024-08-20 20:01:53,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4935790.0, ans=0.04949747468305833 2024-08-20 20:02:03,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-20 20:02:17,221 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-08-20 20:02:29,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4935990.0, ans=0.125 2024-08-20 20:02:38,614 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 20:02:42,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4936090.0, ans=0.125 2024-08-20 20:02:48,902 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4650, loss[loss=0.1126, beats_loss=0.009205, ecapa_loss=0.0001405, whisper_loss=0.102, over 19436.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08925, over 3827109.09 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:03:02,689 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 20:03:06,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4936290.0, ans=0.0 2024-08-20 20:03:20,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4936290.0, ans=0.125 2024-08-20 20:03:26,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4936390.0, ans=0.125 2024-08-20 20:03:26,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4936390.0, ans=0.0 2024-08-20 20:03:34,139 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 21 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-20 20:03:38,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4936390.0, ans=0.0 2024-08-20 20:03:43,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.237e+01 2.528e+01 2.827e+01 5.668e+01, threshold=5.055e+01, percent-clipped=2.0 2024-08-20 20:03:51,432 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.65 vs. limit=10.0 2024-08-20 20:03:57,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4936590.0, ans=0.0 2024-08-20 20:03:59,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4936590.0, ans=0.125 2024-08-20 20:04:15,052 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4700, loss[loss=0.122, beats_loss=0.007202, ecapa_loss=0.000143, whisper_loss=0.1134, over 15462.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.08959, over 3824601.15 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:04:15,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4936690.0, ans=0.0 2024-08-20 20:04:29,103 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:04:35,278 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 12 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 20:04:49,196 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 20:05:07,944 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 20:05:19,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4936990.0, ans=0.1 2024-08-20 20:05:23,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4937090.0, ans=0.125 2024-08-20 20:05:28,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4937090.0, ans=0.125 2024-08-20 20:05:32,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-20 20:05:39,983 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4750, loss[loss=0.1048, beats_loss=0.01072, ecapa_loss=0.0001216, whisper_loss=0.09285, over 20577.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08903, over 3823983.37 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:05:56,304 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 20:05:56,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4937290.0, ans=0.0 2024-08-20 20:05:59,917 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 9 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 20:06:12,533 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 20:06:21,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4937390.0, ans=0.2 2024-08-20 20:06:37,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.283e+01 2.561e+01 2.830e+01 4.199e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-20 20:06:54,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4937590.0, ans=0.0 2024-08-20 20:07:00,395 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 20:07:09,507 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4800, loss[loss=0.0969, beats_loss=0.01082, ecapa_loss=0.0001145, whisper_loss=0.08493, over 15320.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.08976, over 3808453.94 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:07:21,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4937690.0, ans=0.125 2024-08-20 20:07:29,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4937790.0, ans=0.035 2024-08-20 20:07:34,338 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:07:38,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4937790.0, ans=0.125 2024-08-20 20:07:39,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4937790.0, ans=0.125 2024-08-20 20:07:44,202 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 20:07:58,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-08-20 20:08:02,679 WARNING [optim.py:496] (2/4) Scaling gradients by 0.025113865733146667, model_norm_threshold=51.21064758300781 2024-08-20 20:08:02,846 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+06, grad_sumsq=1.205e+06, orig_rms_sq=1.000e+00 2024-08-20 20:08:14,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4937990.0, ans=0.0 2024-08-20 20:08:19,628 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 11 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 20:08:35,266 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 20:08:37,931 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4850, loss[loss=0.08517, beats_loss=0.01056, ecapa_loss=0.0001056, whisper_loss=0.07356, over 14629.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.08928, over 3820555.00 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:09:20,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4938390.0, ans=0.0 2024-08-20 20:09:22,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4938390.0, ans=0.0 2024-08-20 20:09:34,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.379e+01 2.535e+01 2.834e+01 2.039e+03, threshold=5.069e+01, percent-clipped=1.0 2024-08-20 20:09:58,428 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4938590.0, ans=0.125 2024-08-20 20:10:02,803 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 20:10:05,585 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4900, loss[loss=0.1091, beats_loss=0.007086, ecapa_loss=0.0001871, whisper_loss=0.1001, over 18178.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.000139, whisper_loss=0.08908, over 3852647.38 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:10:06,486 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4938690.0, ans=0.125 2024-08-20 20:10:18,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4938690.0, ans=0.1 2024-08-20 20:10:54,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4938890.0, ans=0.125 2024-08-20 20:11:13,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4938990.0, ans=0.2 2024-08-20 20:11:21,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4939090.0, ans=0.0 2024-08-20 20:11:23,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4939090.0, ans=0.125 2024-08-20 20:11:34,777 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 4950, loss[loss=0.09207, beats_loss=0.01063, ecapa_loss=0.0001417, whisper_loss=0.08002, over 21240.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.0001393, whisper_loss=0.08906, over 3882480.20 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:11:42,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-20 20:12:09,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4939290.0, ans=0.1 2024-08-20 20:12:23,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4939390.0, ans=0.125 2024-08-20 20:12:28,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4939490.0, ans=0.125 2024-08-20 20:12:32,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4939490.0, ans=0.0 2024-08-20 20:12:32,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.356e+01 2.576e+01 2.948e+01 1.126e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-20 20:12:54,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4939590.0, ans=0.0 2024-08-20 20:13:01,917 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 20:13:04,121 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 20:13:05,121 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5000, loss[loss=0.1095, beats_loss=0.008901, ecapa_loss=0.0001314, whisper_loss=0.09928, over 22115.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001393, whisper_loss=0.08966, over 3897540.90 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:13:18,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4939690.0, ans=0.1 2024-08-20 20:13:59,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4939990.0, ans=0.125 2024-08-20 20:14:13,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4939990.0, ans=0.0 2024-08-20 20:14:23,903 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 20:14:36,053 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5050, loss[loss=0.08414, beats_loss=0.01244, ecapa_loss=0.0001207, whisper_loss=0.07049, over 21927.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.08979, over 3879294.73 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:14:38,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4940190.0, ans=0.2 2024-08-20 20:15:14,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4940390.0, ans=0.0 2024-08-20 20:15:32,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4940490.0, ans=0.0 2024-08-20 20:15:33,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.260e+01 2.507e+01 2.805e+01 5.478e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 20:15:33,605 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4940490.0, ans=0.1 2024-08-20 20:16:04,929 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5100, loss[loss=0.09513, beats_loss=0.01086, ecapa_loss=0.0001288, whisper_loss=0.08298, over 20343.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08976, over 3886562.31 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:16:12,787 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 20:16:27,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4940790.0, ans=15.0 2024-08-20 20:16:31,820 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 18 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 20:16:40,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-20 20:16:54,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-20 20:17:24,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-08-20 20:17:32,262 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5150, loss[loss=0.09846, beats_loss=0.0113, ecapa_loss=0.0001486, whisper_loss=0.08567, over 16575.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08953, over 3876997.28 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:17:37,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4941190.0, ans=0.0 2024-08-20 20:17:41,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4941190.0, ans=0.0 2024-08-20 20:17:41,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-20 20:17:47,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=22.5 2024-08-20 20:17:59,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4941290.0, ans=0.125 2024-08-20 20:18:27,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.219e+01 2.541e+01 2.868e+01 3.859e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 20:18:36,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4941490.0, ans=0.125 2024-08-20 20:18:46,709 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 20:18:54,209 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4941590.0, ans=0.1 2024-08-20 20:18:57,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4941690.0, ans=0.125 2024-08-20 20:18:57,902 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5200, loss[loss=0.09391, beats_loss=0.01124, ecapa_loss=0.0001286, whisper_loss=0.08139, over 16195.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.08914, over 3821799.01 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:19:10,896 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 20:19:11,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4941690.0, ans=0.125 2024-08-20 20:19:24,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4941790.0, ans=0.015 2024-08-20 20:19:45,729 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 10 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 20:20:11,197 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 20:20:11,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=12.0 2024-08-20 20:20:15,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4942090.0, ans=0.125 2024-08-20 20:20:23,749 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4942090.0, ans=0.0 2024-08-20 20:20:26,008 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5250, loss[loss=0.1063, beats_loss=0.008365, ecapa_loss=0.0001583, whisper_loss=0.09636, over 20918.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.08994, over 3837100.39 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:20:39,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4942190.0, ans=0.1 2024-08-20 20:20:44,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4942290.0, ans=0.0 2024-08-20 20:20:44,666 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:20:56,452 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 20:21:19,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4942490.0, ans=0.125 2024-08-20 20:21:22,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.241e+01 2.519e+01 2.751e+01 3.972e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 20:21:34,762 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 20:21:53,478 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5300, loss[loss=0.1171, beats_loss=0.006354, ecapa_loss=0.0001512, whisper_loss=0.1092, over 16504.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.09004, over 3783674.52 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:22:10,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4942790.0, ans=0.2 2024-08-20 20:22:27,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4942790.0, ans=0.1 2024-08-20 20:22:34,161 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4942890.0, ans=0.125 2024-08-20 20:22:50,147 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-20 20:23:08,438 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 20:23:12,181 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 20:23:19,128 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 20:23:22,215 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5350, loss[loss=0.06972, beats_loss=0.0129, ecapa_loss=0.0001242, whisper_loss=0.05557, over 16021.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001387, whisper_loss=0.08981, over 3766930.85 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:23:29,781 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 20:23:34,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4943190.0, ans=0.07 2024-08-20 20:23:37,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4943190.0, ans=0.125 2024-08-20 20:24:00,252 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 20:24:02,035 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 20:24:12,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4943390.0, ans=0.0 2024-08-20 20:24:16,870 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 20:24:19,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.338e+01 2.503e+01 2.804e+01 4.042e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 20:24:31,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4943490.0, ans=0.0 2024-08-20 20:24:34,127 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 20:24:51,253 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5400, loss[loss=0.114, beats_loss=0.01024, ecapa_loss=0.0001103, whisper_loss=0.1027, over 22404.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09003, over 3770550.41 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:24:57,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4943690.0, ans=0.0 2024-08-20 20:24:57,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4943690.0, ans=0.2 2024-08-20 20:24:57,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4943690.0, ans=0.1 2024-08-20 20:25:07,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4943790.0, ans=0.5 2024-08-20 20:25:16,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4943790.0, ans=0.1 2024-08-20 20:25:30,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-20 20:25:45,305 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4943990.0, ans=0.2 2024-08-20 20:25:55,924 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:26:17,973 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5450, loss[loss=0.09713, beats_loss=0.009647, ecapa_loss=0.0001399, whisper_loss=0.08609, over 14033.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.08971, over 3763846.78 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:26:26,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-20 20:26:32,059 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4944190.0, ans=0.125 2024-08-20 20:26:34,834 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 20:26:50,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4944290.0, ans=0.1 2024-08-20 20:27:06,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4944390.0, ans=0.125 2024-08-20 20:27:14,030 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 20:27:17,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.208e+01 2.419e+01 2.750e+01 4.613e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-20 20:27:19,349 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 20:27:35,290 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 20:27:37,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4944590.0, ans=0.09899494936611666 2024-08-20 20:27:48,474 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5500, loss[loss=0.09548, beats_loss=0.01173, ecapa_loss=0.0001401, whisper_loss=0.08235, over 22740.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.08929, over 3769654.53 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:27:48,656 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 20:27:58,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-20 20:28:02,695 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 20:28:06,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4944790.0, ans=0.125 2024-08-20 20:28:12,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4944790.0, ans=0.0 2024-08-20 20:28:15,724 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.836e-02 2024-08-20 20:28:17,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4944790.0, ans=0.1 2024-08-20 20:28:17,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4944790.0, ans=0.2 2024-08-20 20:28:28,640 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 20:29:02,868 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 20:29:15,529 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.310e+01 2024-08-20 20:29:16,159 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5550, loss[loss=0.09044, beats_loss=0.0107, ecapa_loss=0.0001304, whisper_loss=0.07844, over 18495.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001379, whisper_loss=0.08959, over 3789955.83 frames. ], batch size: 75, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:29:38,560 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=22.5 2024-08-20 20:29:40,177 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-08-20 20:30:04,829 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4945390.0, ans=0.125 2024-08-20 20:30:12,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.276e+01 2.520e+01 2.741e+01 3.796e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-20 20:30:16,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4945490.0, ans=0.125 2024-08-20 20:30:23,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4945490.0, ans=0.0 2024-08-20 20:30:27,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4945590.0, ans=0.2 2024-08-20 20:30:44,038 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5600, loss[loss=0.1111, beats_loss=0.008733, ecapa_loss=0.0001711, whisper_loss=0.1006, over 17245.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001386, whisper_loss=0.08949, over 3796172.72 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:30:57,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4945690.0, ans=0.1 2024-08-20 20:30:57,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4945690.0, ans=0.125 2024-08-20 20:31:06,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4945790.0, ans=0.125 2024-08-20 20:31:14,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4945790.0, ans=0.1 2024-08-20 20:31:35,304 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4945890.0, ans=0.1 2024-08-20 20:31:36,281 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 20:32:12,739 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5650, loss[loss=0.08664, beats_loss=0.01356, ecapa_loss=0.0001564, whisper_loss=0.07152, over 20678.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001394, whisper_loss=0.09013, over 3814120.28 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:32:29,988 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 20:32:37,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4946290.0, ans=0.125 2024-08-20 20:33:09,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.323e+01 2.509e+01 2.836e+01 4.746e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-20 20:33:22,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4946490.0, ans=0.125 2024-08-20 20:33:26,540 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 20:33:38,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4946590.0, ans=0.125 2024-08-20 20:33:40,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4946590.0, ans=0.1 2024-08-20 20:33:43,477 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5700, loss[loss=0.06155, beats_loss=0.01124, ecapa_loss=0.0001948, whisper_loss=0.04836, over 14592.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001393, whisper_loss=0.08996, over 3797556.58 frames. ], batch size: 62, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:33:44,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4946690.0, ans=0.0 2024-08-20 20:33:49,365 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-08-20 20:33:53,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4946690.0, ans=0.0 2024-08-20 20:34:07,781 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4946790.0, ans=0.125 2024-08-20 20:34:08,778 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 20:34:12,927 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 20:34:21,537 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 19 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 20:34:29,606 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 20:34:40,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4946890.0, ans=0.125 2024-08-20 20:34:49,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4946990.0, ans=0.125 2024-08-20 20:35:15,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4947090.0, ans=0.05 2024-08-20 20:35:19,982 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5750, loss[loss=0.1028, beats_loss=0.01006, ecapa_loss=0.0001517, whisper_loss=0.09125, over 17752.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01029, ecapa_loss=0.0001392, whisper_loss=0.08949, over 3801895.49 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:35:44,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4947290.0, ans=0.125 2024-08-20 20:35:58,856 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.175e+05 2024-08-20 20:36:02,258 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4947390.0, ans=0.125 2024-08-20 20:36:06,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4947390.0, ans=0.125 2024-08-20 20:36:14,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4947390.0, ans=0.2 2024-08-20 20:36:23,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.253e+01 2.565e+01 2.811e+01 3.552e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 20:36:29,292 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4947490.0, ans=0.0 2024-08-20 20:36:36,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4947490.0, ans=0.0 2024-08-20 20:36:49,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4947590.0, ans=0.125 2024-08-20 20:36:53,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-20 20:36:57,630 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5800, loss[loss=0.1073, beats_loss=0.009398, ecapa_loss=0.0001507, whisper_loss=0.09642, over 23333.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01036, ecapa_loss=0.0001402, whisper_loss=0.08934, over 3834200.10 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:37:02,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4947690.0, ans=0.2 2024-08-20 20:37:13,764 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 20:37:15,596 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-08-20 20:37:17,410 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4947790.0, ans=0.125 2024-08-20 20:37:26,952 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 20:37:28,344 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2024-08-20 20:37:42,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4947890.0, ans=0.125 2024-08-20 20:37:44,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4947890.0, ans=0.0 2024-08-20 20:37:47,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4947890.0, ans=0.0 2024-08-20 20:37:49,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4947890.0, ans=0.125 2024-08-20 20:38:18,392 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4948090.0, ans=0.05 2024-08-20 20:38:27,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4948090.0, ans=0.2 2024-08-20 20:38:29,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4948090.0, ans=0.0 2024-08-20 20:38:30,484 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.213e+05 2024-08-20 20:38:36,420 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5850, loss[loss=0.08795, beats_loss=0.01266, ecapa_loss=0.0001208, whisper_loss=0.07409, over 23624.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.08898, over 3847704.26 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:38:41,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4948190.0, ans=0.125 2024-08-20 20:38:47,549 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 20:38:57,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4948290.0, ans=0.125 2024-08-20 20:38:57,702 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-20 20:39:02,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4948290.0, ans=0.1 2024-08-20 20:39:06,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4948290.0, ans=0.125 2024-08-20 20:39:09,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4948290.0, ans=0.0 2024-08-20 20:39:12,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4948390.0, ans=0.125 2024-08-20 20:39:31,714 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 20:39:33,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.476e+01 2.710e+01 3.923e+01, threshold=4.952e+01, percent-clipped=0.0 2024-08-20 20:39:38,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4948490.0, ans=0.2 2024-08-20 20:39:42,693 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 20:39:51,050 WARNING [optim.py:496] (2/4) Scaling gradients by 0.021723005920648575, model_norm_threshold=49.52134323120117 2024-08-20 20:39:51,219 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.306e+06, grad_sumsq=3.974e+05, orig_rms_sq=3.286e+00 2024-08-20 20:39:53,013 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 21 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 20:40:06,630 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5900, loss[loss=0.09968, beats_loss=0.01003, ecapa_loss=0.000127, whisper_loss=0.08838, over 15472.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001402, whisper_loss=0.08936, over 3842485.23 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:40:47,622 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4948890.0, ans=15.0 2024-08-20 20:41:05,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4948990.0, ans=0.1 2024-08-20 20:41:09,832 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 20:41:33,666 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4949090.0, ans=0.125 2024-08-20 20:41:36,090 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 5950, loss[loss=0.1395, beats_loss=0.006972, ecapa_loss=0.0001279, whisper_loss=0.1313, over 15853.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001395, whisper_loss=0.08899, over 3842130.56 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:41:39,104 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2024-08-20 20:41:54,157 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4949290.0, ans=0.0 2024-08-20 20:42:13,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4949390.0, ans=0.0 2024-08-20 20:42:22,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4949390.0, ans=0.1 2024-08-20 20:42:27,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4949390.0, ans=0.125 2024-08-20 20:42:33,655 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.390e+01 2.635e+01 2.817e+01 2.280e+03, threshold=5.271e+01, percent-clipped=1.0 2024-08-20 20:42:42,698 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-20 20:42:48,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4949590.0, ans=0.0 2024-08-20 20:43:06,089 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6000, loss[loss=0.07506, beats_loss=0.01119, ecapa_loss=0.0001593, whisper_loss=0.06227, over 15234.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.08929, over 3833346.01 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:43:06,090 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 20:43:57,945 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005083, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 20:44:22,277 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on SV_voxceleb1: loss=0.003999, beats_loss=0, ecapa_loss=0.0003999, whisper_loss=0, over 944235.00 frames. 2024-08-20 20:45:22,414 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4580, 1.9010, 1.6244, 1.3790, 1.5740, 1.4450, 1.7026, 1.6195], device='cuda:2') 2024-08-20 20:45:57,893 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 20:45:57,896 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 20:46:08,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4949690.0, ans=0.125 2024-08-20 20:46:08,458 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-20 20:46:24,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4949790.0, ans=0.1 2024-08-20 20:47:23,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4950090.0, ans=0.05 2024-08-20 20:47:36,897 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6050, loss[loss=0.1166, beats_loss=0.01283, ecapa_loss=8.959e-05, whisper_loss=0.1029, over 22581.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.0001381, whisper_loss=0.08872, over 3853624.96 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:47:38,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4950190.0, ans=0.0 2024-08-20 20:48:19,380 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-08-20 20:48:28,367 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 20:48:39,018 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 20:48:42,527 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 20:48:53,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.341e+01 2.597e+01 2.876e+01 5.831e+01, threshold=5.193e+01, percent-clipped=1.0 2024-08-20 20:48:59,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4950490.0, ans=0.125 2024-08-20 20:49:03,362 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-08-20 20:49:27,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6100, loss[loss=0.1265, beats_loss=0.007212, ecapa_loss=0.0001959, whisper_loss=0.1174, over 16512.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001376, whisper_loss=0.0888, over 3852258.01 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:49:39,247 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 17 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 20:49:45,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4950690.0, ans=0.2 2024-08-20 20:49:56,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4950790.0, ans=0.1 2024-08-20 20:50:06,627 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-20 20:50:07,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4950790.0, ans=0.1 2024-08-20 20:50:19,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4950890.0, ans=0.125 2024-08-20 20:50:23,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4950890.0, ans=0.125 2024-08-20 20:50:47,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4950990.0, ans=0.0 2024-08-20 20:50:51,542 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 20:51:17,216 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6150, loss[loss=0.06127, beats_loss=0.0144, ecapa_loss=9.848e-05, whisper_loss=0.04588, over 15847.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01052, ecapa_loss=0.0001371, whisper_loss=0.08882, over 3829606.63 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:51:59,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4951290.0, ans=0.95 2024-08-20 20:52:23,554 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 20:52:27,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.295e+01 2.472e+01 2.689e+01 4.282e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-20 20:52:44,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4951590.0, ans=0.05 2024-08-20 20:53:06,735 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6200, loss[loss=0.09061, beats_loss=0.01433, ecapa_loss=0.0001193, whisper_loss=0.07509, over 19015.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001376, whisper_loss=0.08977, over 3840916.01 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:53:14,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4951690.0, ans=0.125 2024-08-20 20:53:36,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4951790.0, ans=0.035 2024-08-20 20:54:56,588 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6250, loss[loss=0.1225, beats_loss=0.00895, ecapa_loss=0.0001369, whisper_loss=0.1121, over 21919.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001387, whisper_loss=0.09033, over 3843245.83 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:54:56,766 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 20 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-20 20:55:01,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=12.0 2024-08-20 20:55:06,067 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 20:55:10,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4952190.0, ans=0.125 2024-08-20 20:55:15,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4952290.0, ans=0.125 2024-08-20 20:55:38,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4952390.0, ans=0.125 2024-08-20 20:55:40,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4952390.0, ans=0.125 2024-08-20 20:55:45,926 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:56:06,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.296e+01 2.528e+01 2.851e+01 2.776e+02, threshold=5.056e+01, percent-clipped=4.0 2024-08-20 20:56:47,424 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6300, loss[loss=0.1048, beats_loss=0.01065, ecapa_loss=0.0001353, whisper_loss=0.09279, over 21808.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001394, whisper_loss=0.09034, over 3844887.16 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:56:50,840 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 20:56:52,156 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4952690.0, ans=0.125 2024-08-20 20:57:36,005 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 15 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 20:57:47,301 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4952890.0, ans=0.125 2024-08-20 20:57:50,660 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:58:44,207 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6350, loss[loss=0.1009, beats_loss=0.01019, ecapa_loss=0.0001372, whisper_loss=0.08929, over 15205.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001387, whisper_loss=0.09047, over 3887367.45 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:59:03,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.05 vs. limit=10.0 2024-08-20 20:59:10,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-20 20:59:19,354 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 20:59:38,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4953390.0, ans=0.125 2024-08-20 20:59:38,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4953390.0, ans=0.0 2024-08-20 20:59:51,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4953490.0, ans=0.0 2024-08-20 20:59:52,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.372e+01 2.595e+01 2.941e+01 1.196e+02, threshold=5.191e+01, percent-clipped=6.0 2024-08-20 21:00:06,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4953590.0, ans=0.09899494936611666 2024-08-20 21:00:07,893 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 45 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 21:00:10,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4953590.0, ans=0.125 2024-08-20 21:00:28,038 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 21:00:29,446 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6400, loss[loss=0.1055, beats_loss=0.009745, ecapa_loss=0.0001545, whisper_loss=0.09421, over 14989.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.0904, over 3884517.85 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:00:29,703 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 21:00:36,630 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4953690.0, ans=0.0 2024-08-20 21:00:44,940 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4953690.0, ans=0.125 2024-08-20 21:01:02,003 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 21:01:04,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4953790.0, ans=0.125 2024-08-20 21:01:32,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4953990.0, ans=0.125 2024-08-20 21:02:08,846 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6450, loss[loss=0.1152, beats_loss=0.01102, ecapa_loss=9.842e-05, whisper_loss=0.1032, over 24342.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001381, whisper_loss=0.0907, over 3874710.55 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:02:19,330 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4954190.0, ans=0.125 2024-08-20 21:02:21,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4954190.0, ans=15.0 2024-08-20 21:02:53,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4954390.0, ans=0.0 2024-08-20 21:03:01,259 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-08-20 21:03:03,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4954390.0, ans=0.0 2024-08-20 21:03:08,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4954490.0, ans=0.2 2024-08-20 21:03:11,011 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.299e+01 2.553e+01 2.896e+01 1.351e+02, threshold=5.106e+01, percent-clipped=1.0 2024-08-20 21:03:31,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2024-08-20 21:03:39,388 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 21:03:41,117 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 21:03:44,165 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-08-20 21:03:44,646 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6500, loss[loss=0.08124, beats_loss=0.01132, ecapa_loss=0.0001448, whisper_loss=0.06847, over 17723.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001386, whisper_loss=0.0901, over 3846845.31 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:04:18,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-20 21:04:57,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4954990.0, ans=0.2 2024-08-20 21:05:01,783 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 21:05:05,469 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4954990.0, ans=0.2 2024-08-20 21:05:13,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=15.0 2024-08-20 21:05:16,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4955090.0, ans=0.0 2024-08-20 21:05:27,175 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 21:05:35,959 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2024-08-20 21:05:38,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4955090.0, ans=0.125 2024-08-20 21:05:40,972 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6550, loss[loss=0.09772, beats_loss=0.009821, ecapa_loss=0.0001674, whisper_loss=0.08623, over 20080.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001395, whisper_loss=0.09002, over 3856330.66 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:05:43,619 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 21:06:14,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4955290.0, ans=0.0 2024-08-20 21:06:16,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4955290.0, ans=0.2 2024-08-20 21:06:33,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2024-08-20 21:06:50,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4955490.0, ans=0.5 2024-08-20 21:06:53,217 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 27 from LS+wenet, 27 from Vox, 17 fro AS 2024-08-20 21:06:57,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.272e+01 2.491e+01 2.852e+01 4.089e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 21:07:37,681 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6600, loss[loss=0.07407, beats_loss=0.01383, ecapa_loss=0.0001332, whisper_loss=0.05891, over 20044.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.00014, whisper_loss=0.09031, over 3876619.29 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:07:46,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4955690.0, ans=0.125 2024-08-20 21:07:52,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4955690.0, ans=0.0 2024-08-20 21:07:55,953 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-20 21:08:09,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4955790.0, ans=0.125 2024-08-20 21:08:46,364 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 26 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-20 21:09:01,089 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-20 21:09:16,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4955990.0, ans=15.0 2024-08-20 21:09:25,155 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 21:09:37,290 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-20 21:09:47,608 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6650, loss[loss=0.1132, beats_loss=0.008198, ecapa_loss=0.0001804, whisper_loss=0.1032, over 22348.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.09058, over 3853077.98 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:09:53,217 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 21:10:02,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4956190.0, ans=0.0 2024-08-20 21:10:31,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4956390.0, ans=0.2 2024-08-20 21:10:46,531 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 21:10:47,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.323e+01 2.536e+01 2.903e+01 4.430e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-20 21:11:04,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=8.0 2024-08-20 21:11:18,826 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6700, loss[loss=0.0927, beats_loss=0.01097, ecapa_loss=0.0001595, whisper_loss=0.08014, over 15206.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.09103, over 3901649.31 frames. ], batch size: 62, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:11:24,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4956690.0, ans=0.0 2024-08-20 21:11:31,733 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 21:11:33,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4956690.0, ans=0.0 2024-08-20 21:11:41,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=12.0 2024-08-20 21:11:56,478 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4956890.0, ans=0.0 2024-08-20 21:11:58,766 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4956890.0, ans=0.125 2024-08-20 21:12:09,032 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4956890.0, ans=0.0 2024-08-20 21:12:14,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4956990.0, ans=0.1 2024-08-20 21:12:46,391 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6750, loss[loss=0.1144, beats_loss=0.009294, ecapa_loss=0.000156, whisper_loss=0.1036, over 22743.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01026, ecapa_loss=0.0001411, whisper_loss=0.09136, over 3906212.23 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:12:50,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4957190.0, ans=0.125 2024-08-20 21:12:56,098 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 21:13:18,050 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 21:13:37,101 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 21:13:44,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.394e+01 2.668e+01 3.101e+01 4.157e+01, threshold=5.336e+01, percent-clipped=0.0 2024-08-20 21:14:12,990 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6800, loss[loss=0.081, beats_loss=0.01323, ecapa_loss=9.459e-05, whisper_loss=0.06683, over 17034.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01027, ecapa_loss=0.0001412, whisper_loss=0.09138, over 3919982.87 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:14:20,120 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 21:14:27,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4957690.0, ans=0.04949747468305833 2024-08-20 21:14:28,830 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 21:14:30,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4957790.0, ans=0.125 2024-08-20 21:14:51,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4957890.0, ans=0.125 2024-08-20 21:15:08,322 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 21:15:11,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2024-08-20 21:15:25,281 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-20 21:15:33,946 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 22 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-20 21:15:39,291 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6850, loss[loss=0.09737, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08554, over 22362.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001406, whisper_loss=0.0906, over 3899742.25 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:15:45,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4958190.0, ans=0.1 2024-08-20 21:15:53,144 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 21:15:56,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4958290.0, ans=0.1 2024-08-20 21:16:06,845 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 21:16:14,064 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 21:16:29,753 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 21:16:36,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.275e+01 2.461e+01 2.676e+01 7.935e+01, threshold=4.923e+01, percent-clipped=1.0 2024-08-20 21:16:40,722 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4958490.0, ans=0.0 2024-08-20 21:16:54,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4958590.0, ans=0.2 2024-08-20 21:17:06,145 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6900, loss[loss=0.08645, beats_loss=0.01162, ecapa_loss=0.0001206, whisper_loss=0.07362, over 22509.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001399, whisper_loss=0.09023, over 3884561.87 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:17:12,571 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 36 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 21:17:28,028 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 21:17:49,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4958890.0, ans=0.125 2024-08-20 21:18:10,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-20 21:18:15,742 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4959090.0, ans=0.1 2024-08-20 21:18:24,284 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 21:18:31,508 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2024-08-20 21:18:31,989 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 6950, loss[loss=0.1156, beats_loss=0.009116, ecapa_loss=0.0001491, whisper_loss=0.105, over 22830.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001393, whisper_loss=0.09131, over 3878773.24 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:18:42,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4959190.0, ans=0.125 2024-08-20 21:18:54,901 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 25 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 21:19:09,796 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-20 21:19:11,874 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:19:13,836 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4959390.0, ans=0.5 2024-08-20 21:19:27,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4959490.0, ans=0.125 2024-08-20 21:19:29,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.291e+01 2.502e+01 2.810e+01 1.652e+02, threshold=5.004e+01, percent-clipped=1.0 2024-08-20 21:19:33,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4959490.0, ans=0.125 2024-08-20 21:19:57,658 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 35 from Vox, 24 fro AS 2024-08-20 21:19:58,205 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-20 21:19:58,703 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7000, loss[loss=0.09318, beats_loss=0.008181, ecapa_loss=0.000206, whisper_loss=0.08294, over 18000.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.09093, over 3835991.76 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:20:18,599 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 12 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 21:20:30,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4959790.0, ans=0.1 2024-08-20 21:20:46,376 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 32 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 21:21:03,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2024-08-20 21:21:15,159 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 21:21:18,961 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 21:21:29,268 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7050, loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001316, whisper_loss=0.09061, over 21418.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09023, over 3824780.45 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:21:56,676 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 21:22:21,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4960490.0, ans=0.125 2024-08-20 21:22:25,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.286e+01 2.529e+01 2.848e+01 4.260e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 21:22:47,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4960590.0, ans=0.1 2024-08-20 21:22:49,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4960590.0, ans=0.0 2024-08-20 21:22:53,512 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-08-20 21:22:55,538 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7100, loss[loss=0.08088, beats_loss=0.009746, ecapa_loss=0.0001248, whisper_loss=0.06989, over 12637.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001395, whisper_loss=0.08995, over 3785646.34 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:23:02,080 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-08-20 21:23:07,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4960690.0, ans=0.0 2024-08-20 21:23:14,113 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 13 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 21:23:14,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4960790.0, ans=0.125 2024-08-20 21:23:22,937 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 21:23:32,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4960890.0, ans=0.125 2024-08-20 21:23:34,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.55 vs. limit=10.0 2024-08-20 21:23:37,475 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 21:23:50,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4960990.0, ans=0.1 2024-08-20 21:23:51,683 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 21:24:14,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4961090.0, ans=0.125 2024-08-20 21:24:19,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4961090.0, ans=0.125 2024-08-20 21:24:24,148 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7150, loss[loss=0.09232, beats_loss=0.009949, ecapa_loss=0.0001331, whisper_loss=0.08104, over 21852.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08951, over 3804369.88 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:24:27,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4961190.0, ans=0.1 2024-08-20 21:24:33,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4961190.0, ans=0.0 2024-08-20 21:24:34,815 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 21:24:36,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4961190.0, ans=0.125 2024-08-20 21:24:37,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4961190.0, ans=0.2 2024-08-20 21:25:09,808 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 21:25:21,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.246e+01 2.471e+01 2.747e+01 3.291e+02, threshold=4.942e+01, percent-clipped=1.0 2024-08-20 21:25:31,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4961490.0, ans=0.1 2024-08-20 21:25:34,450 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 21:25:45,425 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4961590.0, ans=0.1 2024-08-20 21:25:51,819 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7200, loss[loss=0.09768, beats_loss=0.009529, ecapa_loss=0.0001484, whisper_loss=0.08667, over 17402.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.0895, over 3806162.43 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:25:55,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4961690.0, ans=0.125 2024-08-20 21:26:00,565 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-08-20 21:26:12,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4961790.0, ans=0.04949747468305833 2024-08-20 21:26:32,298 WARNING [optim.py:496] (2/4) Scaling gradients by 0.05625057592988014, model_norm_threshold=49.41666793823242 2024-08-20 21:26:32,468 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.330e+05, grad_sumsq=1.330e+05, orig_rms_sq=1.000e+00 2024-08-20 21:26:51,826 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 21:26:59,711 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 18 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 21:27:20,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4962190.0, ans=0.0 2024-08-20 21:27:21,187 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7250, loss[loss=0.0815, beats_loss=0.01343, ecapa_loss=0.0001135, whisper_loss=0.06693, over 16077.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.08954, over 3775761.63 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:27:38,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4962290.0, ans=0.125 2024-08-20 21:27:44,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4962290.0, ans=0.0 2024-08-20 21:27:59,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4962390.0, ans=0.125 2024-08-20 21:28:18,698 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.295e+01 2.557e+01 2.872e+01 8.785e+02, threshold=5.114e+01, percent-clipped=5.0 2024-08-20 21:28:33,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4962590.0, ans=0.0 2024-08-20 21:28:38,406 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 21:28:49,186 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7300, loss[loss=0.1208, beats_loss=0.008239, ecapa_loss=0.000126, whisper_loss=0.1113, over 23474.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001407, whisper_loss=0.09029, over 3800242.59 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:29:00,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4962690.0, ans=0.125 2024-08-20 21:29:01,219 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 21:29:09,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4962790.0, ans=0.0 2024-08-20 21:29:14,757 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 21:29:16,957 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-08-20 21:29:21,555 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 21:29:43,978 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 22 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 21:29:44,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4962990.0, ans=0.1 2024-08-20 21:29:46,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4962990.0, ans=0.125 2024-08-20 21:29:49,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4962990.0, ans=0.09899494936611666 2024-08-20 21:30:05,882 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4963090.0, ans=0.125 2024-08-20 21:30:15,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7350, loss[loss=0.105, beats_loss=0.01145, ecapa_loss=0.000167, whisper_loss=0.09185, over 22138.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.0895, over 3798747.83 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:30:23,266 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4963190.0, ans=0.2 2024-08-20 21:30:24,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4963190.0, ans=0.1 2024-08-20 21:30:33,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4963290.0, ans=0.2 2024-08-20 21:30:36,607 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4963290.0, ans=0.125 2024-08-20 21:31:09,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4963490.0, ans=0.125 2024-08-20 21:31:11,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.257e+01 2.510e+01 2.739e+01 2.616e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 21:31:22,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4963590.0, ans=0.125 2024-08-20 21:31:25,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4963590.0, ans=0.1 2024-08-20 21:31:40,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4963690.0, ans=0.125 2024-08-20 21:31:40,858 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7400, loss[loss=0.1135, beats_loss=0.01074, ecapa_loss=0.0001353, whisper_loss=0.1015, over 23059.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.0892, over 3770071.15 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:31:45,255 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4963690.0, ans=0.125 2024-08-20 21:31:50,324 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 22 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-20 21:31:57,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4963790.0, ans=0.1 2024-08-20 21:32:14,275 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 21:32:17,970 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 21:32:23,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4963890.0, ans=0.1 2024-08-20 21:32:30,903 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 21:32:36,293 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4963990.0, ans=0.0 2024-08-20 21:32:37,949 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 21:32:39,539 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 21:32:41,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4963990.0, ans=0.07 2024-08-20 21:32:43,253 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 21:32:54,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4964090.0, ans=0.2 2024-08-20 21:33:08,796 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 14 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-20 21:33:09,746 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7450, loss[loss=0.09084, beats_loss=0.008017, ecapa_loss=0.0001546, whisper_loss=0.08128, over 13405.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01026, ecapa_loss=0.0001417, whisper_loss=0.09053, over 3794343.07 frames. ], batch size: 52, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:33:14,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4964190.0, ans=0.125 2024-08-20 21:33:36,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4964290.0, ans=0.125 2024-08-20 21:33:41,281 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 21:33:41,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4964290.0, ans=0.0 2024-08-20 21:33:45,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4964390.0, ans=0.1 2024-08-20 21:33:49,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-20 21:33:50,551 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-20 21:34:06,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4964490.0, ans=0.125 2024-08-20 21:34:08,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.276e+01 2.553e+01 2.837e+01 3.852e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 21:34:31,108 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-20 21:34:31,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4964590.0, ans=0.0 2024-08-20 21:34:39,233 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7500, loss[loss=0.1069, beats_loss=0.01157, ecapa_loss=0.0001729, whisper_loss=0.0936, over 19957.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.0001426, whisper_loss=0.09083, over 3787356.76 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:34:55,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4964790.0, ans=0.125 2024-08-20 21:35:02,498 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 21:35:16,623 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 21:35:21,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4964890.0, ans=0.1 2024-08-20 21:35:25,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4964890.0, ans=0.025 2024-08-20 21:35:45,627 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 13 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 21:36:01,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-20 21:36:05,648 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7550, loss[loss=0.09684, beats_loss=0.01099, ecapa_loss=0.0001195, whisper_loss=0.08466, over 19412.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.08992, over 3790821.59 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:36:17,111 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-08-20 21:36:21,833 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 21:36:25,942 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4965290.0, ans=0.05 2024-08-20 21:36:28,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4965290.0, ans=0.125 2024-08-20 21:37:00,050 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 21:37:01,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4965490.0, ans=0.2 2024-08-20 21:37:02,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.306e+01 2.507e+01 2.711e+01 6.032e+01, threshold=5.014e+01, percent-clipped=1.0 2024-08-20 21:37:05,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2024-08-20 21:37:07,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4965490.0, ans=0.125 2024-08-20 21:37:08,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2024-08-20 21:37:31,661 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7600, loss[loss=0.08804, beats_loss=0.01253, ecapa_loss=0.0001625, whisper_loss=0.07389, over 19104.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01024, ecapa_loss=0.0001428, whisper_loss=0.0902, over 3784234.57 frames. ], batch size: 78, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:37:40,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4965690.0, ans=0.125 2024-08-20 21:37:40,894 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.98 vs. limit=22.5 2024-08-20 21:37:45,165 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 21:37:50,205 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-20 21:37:54,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-20 21:37:54,567 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-20 21:38:20,341 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-20 21:38:25,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4965990.0, ans=0.125 2024-08-20 21:38:39,326 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4966090.0, ans=0.125 2024-08-20 21:38:44,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4966090.0, ans=0.0 2024-08-20 21:38:56,968 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7650, loss[loss=0.1169, beats_loss=0.01067, ecapa_loss=0.0001301, whisper_loss=0.1049, over 23197.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01021, ecapa_loss=0.0001419, whisper_loss=0.09034, over 3777962.73 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:39:07,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4966190.0, ans=0.1 2024-08-20 21:39:18,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4966290.0, ans=0.125 2024-08-20 21:39:21,646 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 21:39:48,183 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2024-08-20 21:39:48,895 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 20 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-20 21:39:50,878 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 21 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 21:39:53,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.307e+01 2.519e+01 2.833e+01 3.884e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 21:40:07,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4966590.0, ans=0.125 2024-08-20 21:40:23,460 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7700, loss[loss=0.09457, beats_loss=0.008703, ecapa_loss=0.0001493, whisper_loss=0.08437, over 22303.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001417, whisper_loss=0.09091, over 3799888.15 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:40:30,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4966690.0, ans=0.0 2024-08-20 21:41:04,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4966890.0, ans=0.125 2024-08-20 21:41:08,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4966890.0, ans=0.5 2024-08-20 21:41:08,202 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2024-08-20 21:41:18,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4966990.0, ans=0.125 2024-08-20 21:41:19,836 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 21:41:48,703 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7750, loss[loss=0.121, beats_loss=0.0081, ecapa_loss=0.0001369, whisper_loss=0.1115, over 18863.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01027, ecapa_loss=0.0001413, whisper_loss=0.0907, over 3805002.85 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:41:56,341 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 29 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 21:41:56,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4967190.0, ans=0.2 2024-08-20 21:42:06,767 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 21:42:14,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4967290.0, ans=0.0 2024-08-20 21:42:26,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4967390.0, ans=0.0 2024-08-20 21:42:47,270 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.264e+01 2.525e+01 2.747e+01 3.905e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-20 21:42:50,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-20 21:42:57,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=22.5 2024-08-20 21:42:58,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4967590.0, ans=0.125 2024-08-20 21:43:00,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4967590.0, ans=0.1 2024-08-20 21:43:04,294 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2024-08-20 21:43:16,720 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7800, loss[loss=0.08246, beats_loss=0.01013, ecapa_loss=0.0001683, whisper_loss=0.07065, over 20790.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01031, ecapa_loss=0.0001406, whisper_loss=0.09009, over 3795869.63 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:43:19,192 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4967690.0, ans=0.125 2024-08-20 21:43:25,781 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 21:43:53,872 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4967890.0, ans=0.125 2024-08-20 21:44:11,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4967990.0, ans=0.2 2024-08-20 21:44:16,370 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 21:44:31,958 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 14 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 21:44:43,124 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7850, loss[loss=0.09132, beats_loss=0.009575, ecapa_loss=0.0001888, whisper_loss=0.07986, over 19275.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001401, whisper_loss=0.09034, over 3804793.35 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:45:26,140 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 18 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 21:45:28,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4968390.0, ans=0.125 2024-08-20 21:45:30,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4968390.0, ans=0.125 2024-08-20 21:45:31,189 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-20 21:45:41,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.317e+01 2.497e+01 2.913e+01 5.826e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 21:45:47,372 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 21:45:50,509 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 21:45:56,561 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-20 21:46:08,506 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-20 21:46:10,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4968690.0, ans=0.2 2024-08-20 21:46:11,159 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7900, loss[loss=0.1235, beats_loss=0.009153, ecapa_loss=0.0001223, whisper_loss=0.1131, over 24253.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09042, over 3820765.81 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:46:17,938 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 21:46:35,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4968790.0, ans=0.1 2024-08-20 21:46:41,952 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 38 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 21:46:49,183 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 24 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-20 21:47:35,339 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 21 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 21:47:38,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4969190.0, ans=0.0 2024-08-20 21:47:38,926 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 7950, loss[loss=0.09635, beats_loss=0.009698, ecapa_loss=0.0001304, whisper_loss=0.08535, over 18310.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.09007, over 3790473.49 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:47:46,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4969190.0, ans=0.125 2024-08-20 21:47:49,317 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 17 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-20 21:47:52,348 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4969190.0, ans=0.0 2024-08-20 21:48:02,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4969290.0, ans=0.0 2024-08-20 21:48:14,257 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 33 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 21:48:26,881 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 21:48:32,197 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 21:48:37,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.335e+01 2.581e+01 2.814e+01 4.962e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 21:48:47,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-20 21:48:52,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4969590.0, ans=0.0 2024-08-20 21:49:04,291 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 21:49:07,277 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8000, loss[loss=0.1022, beats_loss=0.01081, ecapa_loss=0.0001162, whisper_loss=0.09024, over 23345.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01028, ecapa_loss=0.0001391, whisper_loss=0.09045, over 3827242.18 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:49:07,989 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 21:49:22,917 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4969690.0, ans=0.95 2024-08-20 21:49:26,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4969790.0, ans=0.125 2024-08-20 21:49:36,961 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 21:49:49,242 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 30 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 21:50:00,726 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0865812599658966, model_norm_threshold=51.61667251586914 2024-08-20 21:50:00,896 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.075e+04, grad_sumsq=5.075e+04, orig_rms_sq=1.000e+00 2024-08-20 21:50:08,258 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-20 21:50:19,863 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 21:50:30,958 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2024-08-20 21:50:34,968 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8050, loss[loss=0.09651, beats_loss=0.01041, ecapa_loss=0.0001631, whisper_loss=0.08447, over 19241.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.09092, over 3845000.79 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:50:42,737 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 21:50:58,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2024-08-20 21:51:10,030 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 26 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 21:51:37,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.558e+01 2.951e+01 5.962e+02, threshold=5.117e+01, percent-clipped=2.0 2024-08-20 21:51:55,935 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 21:52:03,753 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8100, loss[loss=0.08634, beats_loss=0.01427, ecapa_loss=0.0001097, whisper_loss=0.07097, over 14935.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.09088, over 3814184.60 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:52:21,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4970790.0, ans=0.0 2024-08-20 21:52:32,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4970790.0, ans=0.0 2024-08-20 21:52:48,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4970890.0, ans=0.09899494936611666 2024-08-20 21:52:57,980 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 25 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-20 21:53:13,806 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 27 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 21:53:15,696 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 21:53:17,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4971090.0, ans=0.0 2024-08-20 21:53:26,281 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 21:53:34,294 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8150, loss[loss=0.1195, beats_loss=0.009358, ecapa_loss=0.0001322, whisper_loss=0.1088, over 23047.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01038, ecapa_loss=0.0001383, whisper_loss=0.09158, over 3780522.90 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:53:41,470 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 21:53:46,927 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 25 from LS+wenet, 19 from Vox, 11 fro AS 2024-08-20 21:54:19,568 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-20 21:54:22,644 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4971390.0, ans=0.0 2024-08-20 21:54:23,847 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 21:54:28,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4971490.0, ans=22.5 2024-08-20 21:54:37,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.199e+01 2.482e+01 2.728e+01 1.075e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-20 21:54:43,131 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 21:54:43,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4971490.0, ans=0.125 2024-08-20 21:55:01,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4971590.0, ans=0.125 2024-08-20 21:55:04,250 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8200, loss[loss=0.1017, beats_loss=0.008914, ecapa_loss=0.0001324, whisper_loss=0.0915, over 18086.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001376, whisper_loss=0.0912, over 3773194.42 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:55:13,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4971690.0, ans=0.125 2024-08-20 21:55:31,268 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 21:56:28,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4972090.0, ans=0.2 2024-08-20 21:56:31,219 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 21:56:35,515 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8250, loss[loss=0.06815, beats_loss=0.01292, ecapa_loss=0.0001272, whisper_loss=0.05395, over 15712.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000138, whisper_loss=0.09021, over 3767281.21 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:56:38,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4972190.0, ans=0.125 2024-08-20 21:57:01,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4972290.0, ans=0.0 2024-08-20 21:57:02,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4972290.0, ans=0.0 2024-08-20 21:57:13,564 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-20 21:57:20,121 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 21:57:35,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4972490.0, ans=0.125 2024-08-20 21:57:37,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.205e+01 2.481e+01 2.878e+01 4.142e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-20 21:57:51,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=12.0 2024-08-20 21:58:02,897 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8300, loss[loss=0.1127, beats_loss=0.007813, ecapa_loss=0.000157, whisper_loss=0.1033, over 12792.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.09012, over 3778659.28 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:58:04,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4972690.0, ans=0.0 2024-08-20 21:58:18,920 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 21:58:32,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4972790.0, ans=0.125 2024-08-20 21:58:44,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4972890.0, ans=0.125 2024-08-20 21:59:06,870 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 21:59:18,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4973090.0, ans=0.125 2024-08-20 21:59:21,503 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4973090.0, ans=0.2 2024-08-20 21:59:31,381 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8350, loss[loss=0.1156, beats_loss=0.008526, ecapa_loss=0.0001379, whisper_loss=0.1057, over 13932.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.09004, over 3830992.19 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:59:34,756 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2024-08-20 21:59:37,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4973190.0, ans=15.0 2024-08-20 21:59:45,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4973190.0, ans=0.0 2024-08-20 21:59:46,247 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 22:00:08,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4973390.0, ans=0.125 2024-08-20 22:00:15,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4973390.0, ans=0.0 2024-08-20 22:00:16,474 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 22:00:33,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.301e+01 2.484e+01 2.727e+01 5.310e+01, threshold=4.967e+01, percent-clipped=1.0 2024-08-20 22:00:34,462 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 22:00:50,376 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 28 from LS+wenet, 8 from Vox, 19 fro AS 2024-08-20 22:00:58,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4973690.0, ans=0.125 2024-08-20 22:00:58,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4973690.0, ans=0.5 2024-08-20 22:00:59,447 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8400, loss[loss=0.09757, beats_loss=0.01032, ecapa_loss=0.0001703, whisper_loss=0.08555, over 21033.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.09001, over 3825134.71 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:01:39,708 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-20 22:01:55,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4973990.0, ans=0.2 2024-08-20 22:01:59,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4973990.0, ans=0.1 2024-08-20 22:02:28,168 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8450, loss[loss=0.1082, beats_loss=0.00902, ecapa_loss=0.0001505, whisper_loss=0.09765, over 23001.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.09094, over 3825750.25 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:02:29,208 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4974190.0, ans=0.07 2024-08-20 22:02:59,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4974290.0, ans=0.0 2024-08-20 22:03:25,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-20 22:03:27,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4974490.0, ans=0.0 2024-08-20 22:03:32,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.304e+01 2.514e+01 2.804e+01 1.040e+02, threshold=5.029e+01, percent-clipped=2.0 2024-08-20 22:03:46,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4974590.0, ans=0.2 2024-08-20 22:03:50,121 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-20 22:03:56,793 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4974590.0, ans=0.2 2024-08-20 22:03:59,167 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8500, loss[loss=0.1053, beats_loss=0.01041, ecapa_loss=0.0001378, whisper_loss=0.0935, over 22876.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01027, ecapa_loss=0.000139, whisper_loss=0.09085, over 3827786.28 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:04:05,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4974690.0, ans=0.125 2024-08-20 22:04:18,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4974790.0, ans=0.0 2024-08-20 22:04:25,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4974790.0, ans=0.0 2024-08-20 22:04:27,979 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2024-08-20 22:04:39,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4974890.0, ans=0.1 2024-08-20 22:04:41,654 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4974890.0, ans=0.125 2024-08-20 22:05:31,233 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8550, loss[loss=0.1046, beats_loss=0.01163, ecapa_loss=0.0001249, whisper_loss=0.09176, over 20610.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01023, ecapa_loss=0.0001401, whisper_loss=0.09155, over 3869916.53 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:05:49,372 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 22:05:57,316 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4975290.0, ans=0.0 2024-08-20 22:06:04,163 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 22:06:18,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4975390.0, ans=0.0 2024-08-20 22:06:22,740 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 22:06:33,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.271e+01 2.473e+01 2.779e+01 6.630e+01, threshold=4.947e+01, percent-clipped=2.0 2024-08-20 22:06:37,092 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 22:06:51,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4975590.0, ans=0.0 2024-08-20 22:06:59,609 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8600, loss[loss=0.09189, beats_loss=0.01141, ecapa_loss=0.0001473, whisper_loss=0.079, over 22709.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01019, ecapa_loss=0.0001406, whisper_loss=0.09184, over 3863503.99 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:07:07,585 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4975690.0, ans=0.125 2024-08-20 22:07:07,946 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-20 22:07:18,580 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 22:07:43,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4975890.0, ans=0.1 2024-08-20 22:07:58,398 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 29 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 22:07:59,822 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 22:08:13,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4975990.0, ans=0.0 2024-08-20 22:08:16,583 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-20 22:08:21,892 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 22:08:31,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4976090.0, ans=0.125 2024-08-20 22:08:34,389 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8650, loss[loss=0.1204, beats_loss=0.008442, ecapa_loss=0.0001436, whisper_loss=0.1106, over 17235.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01024, ecapa_loss=0.0001396, whisper_loss=0.09151, over 3869117.81 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:08:34,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4976190.0, ans=0.125 2024-08-20 22:08:48,899 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4976190.0, ans=0.125 2024-08-20 22:08:48,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4976190.0, ans=0.125 2024-08-20 22:08:50,349 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-20 22:08:56,134 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 21 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 22:09:03,159 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 22:09:06,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4976290.0, ans=0.125 2024-08-20 22:09:09,189 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-20 22:09:11,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-20 22:09:31,042 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 22:09:36,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4976490.0, ans=10.0 2024-08-20 22:09:37,400 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.289e+01 2.459e+01 2.667e+01 5.043e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-20 22:09:38,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=22.5 2024-08-20 22:09:39,527 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 22:09:39,984 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4976490.0, ans=0.125 2024-08-20 22:09:49,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4976590.0, ans=0.125 2024-08-20 22:09:56,838 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-20 22:10:00,528 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 22:10:04,899 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8700, loss[loss=0.1152, beats_loss=0.009839, ecapa_loss=0.0001336, whisper_loss=0.1041, over 20814.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001386, whisper_loss=0.09085, over 3838620.77 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:10:28,501 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4976790.0, ans=0.125 2024-08-20 22:10:32,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4976790.0, ans=0.2 2024-08-20 22:10:36,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4976790.0, ans=0.125 2024-08-20 22:11:00,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4976890.0, ans=0.125 2024-08-20 22:11:01,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4976890.0, ans=0.025 2024-08-20 22:11:10,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4976990.0, ans=0.2 2024-08-20 22:11:30,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4977090.0, ans=0.1 2024-08-20 22:11:39,635 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2024-08-20 22:11:40,074 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8750, loss[loss=0.08006, beats_loss=0.01112, ecapa_loss=0.0001095, whisper_loss=0.06785, over 13767.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01031, ecapa_loss=0.0001376, whisper_loss=0.09192, over 3852700.84 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:11:44,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4977190.0, ans=0.125 2024-08-20 22:12:05,113 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 10 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 22:12:12,082 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 22:12:20,715 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 22:12:32,514 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4977490.0, ans=0.1 2024-08-20 22:12:32,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4977490.0, ans=0.09899494936611666 2024-08-20 22:12:41,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.311e+01 2.545e+01 2.845e+01 5.108e+01, threshold=5.089e+01, percent-clipped=1.0 2024-08-20 22:13:03,813 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4977590.0, ans=0.125 2024-08-20 22:13:08,479 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8800, loss[loss=0.0911, beats_loss=0.0117, ecapa_loss=0.0001175, whisper_loss=0.07823, over 23729.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001374, whisper_loss=0.09136, over 3859920.37 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:13:22,129 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 22:13:22,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4977690.0, ans=0.125 2024-08-20 22:13:27,410 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 22:13:30,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4977790.0, ans=0.125 2024-08-20 22:13:32,238 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 22:13:34,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4977790.0, ans=0.125 2024-08-20 22:13:58,011 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 22:14:14,130 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4977990.0, ans=0.1 2024-08-20 22:14:30,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4978090.0, ans=0.125 2024-08-20 22:14:36,182 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8850, loss[loss=0.07527, beats_loss=0.01219, ecapa_loss=0.0001255, whisper_loss=0.06182, over 21027.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001374, whisper_loss=0.09057, over 3862802.34 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:14:44,773 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-20 22:15:37,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.277e+01 2.445e+01 2.853e+01 5.587e+01, threshold=4.890e+01, percent-clipped=1.0 2024-08-20 22:15:40,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4978490.0, ans=0.125 2024-08-20 22:16:04,082 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8900, loss[loss=0.0982, beats_loss=0.009895, ecapa_loss=0.0001277, whisper_loss=0.08703, over 22909.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001391, whisper_loss=0.09074, over 3843736.43 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:16:05,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4978690.0, ans=0.125 2024-08-20 22:16:10,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4978690.0, ans=0.125 2024-08-20 22:16:18,563 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 22:16:23,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-20 22:16:29,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4978790.0, ans=0.125 2024-08-20 22:16:41,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4978890.0, ans=0.125 2024-08-20 22:17:08,094 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 22:17:32,256 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 8950, loss[loss=0.1049, beats_loss=0.01071, ecapa_loss=0.0001512, whisper_loss=0.09264, over 22338.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.0001401, whisper_loss=0.09056, over 3821997.87 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:17:41,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-20 22:17:43,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4979190.0, ans=0.125 2024-08-20 22:17:46,147 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-20 22:17:58,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4979290.0, ans=0.125 2024-08-20 22:18:27,948 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4979490.0, ans=0.125 2024-08-20 22:18:32,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.269e+01 2.591e+01 2.866e+01 4.170e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-20 22:18:58,009 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 22:18:58,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4979690.0, ans=0.0 2024-08-20 22:18:59,284 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9000, loss[loss=0.1181, beats_loss=0.01233, ecapa_loss=0.0001306, whisper_loss=0.1044, over 21998.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.09003, over 3826555.31 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:18:59,284 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 22:19:38,078 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005128, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 22:20:04,897 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on SV_voxceleb1: loss=0.003932, beats_loss=0, ecapa_loss=0.0003932, whisper_loss=0, over 944235.00 frames. 2024-08-20 22:21:44,511 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 22:21:44,514 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 22:21:46,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4979690.0, ans=0.0 2024-08-20 22:22:14,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-20 22:22:20,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4979890.0, ans=0.125 2024-08-20 22:22:34,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4979890.0, ans=0.0 2024-08-20 22:22:36,108 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4979990.0, ans=0.2 2024-08-20 22:23:05,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4980090.0, ans=0.0 2024-08-20 22:23:11,223 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9050, loss[loss=0.1144, beats_loss=0.009801, ecapa_loss=0.0001264, whisper_loss=0.1034, over 23453.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.08948, over 3825053.36 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:23:11,390 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 22:23:37,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4980290.0, ans=0.125 2024-08-20 22:23:55,079 WARNING [optim.py:496] (2/4) Scaling gradients by 0.043528925627470016, model_norm_threshold=51.82819747924805 2024-08-20 22:23:55,250 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.654e+05, grad_sumsq=2.654e+05, orig_rms_sq=1.000e+00 2024-08-20 22:23:56,520 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-08-20 22:24:08,379 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 22:24:11,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.252e+01 2.512e+01 2.739e+01 1.191e+03, threshold=5.024e+01, percent-clipped=1.0 2024-08-20 22:24:30,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4980590.0, ans=0.0 2024-08-20 22:24:32,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4980590.0, ans=0.125 2024-08-20 22:24:32,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4980590.0, ans=0.0 2024-08-20 22:24:36,504 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9100, loss[loss=0.08249, beats_loss=0.01375, ecapa_loss=9.17e-05, whisper_loss=0.06782, over 20342.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.08984, over 3807870.71 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:24:37,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4980690.0, ans=0.125 2024-08-20 22:24:37,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4980690.0, ans=0.2 2024-08-20 22:24:55,449 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=12.0 2024-08-20 22:24:57,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4980790.0, ans=0.0 2024-08-20 22:25:05,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4980790.0, ans=0.0 2024-08-20 22:25:06,994 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4980790.0, ans=0.025 2024-08-20 22:25:10,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4980890.0, ans=0.0 2024-08-20 22:25:13,930 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 22:25:22,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=10.0 2024-08-20 22:25:40,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4980990.0, ans=0.125 2024-08-20 22:25:47,362 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 22:25:59,828 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9150, loss[loss=0.07773, beats_loss=0.01098, ecapa_loss=0.0001815, whisper_loss=0.06494, over 15455.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.08993, over 3812588.64 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:26:08,861 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-20 22:26:11,886 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 22:26:52,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4981490.0, ans=10.0 2024-08-20 22:26:58,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.233e+01 2.459e+01 2.675e+01 3.716e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-20 22:27:02,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4981490.0, ans=0.09899494936611666 2024-08-20 22:27:19,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-20 22:27:23,687 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9200, loss[loss=0.08582, beats_loss=0.01012, ecapa_loss=0.0001322, whisper_loss=0.07437, over 19566.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001381, whisper_loss=0.09013, over 3821741.22 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:28:23,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4981990.0, ans=0.0 2024-08-20 22:28:29,135 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-08-20 22:28:29,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=15.0 2024-08-20 22:28:48,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9250, loss[loss=0.09697, beats_loss=0.0109, ecapa_loss=0.000121, whisper_loss=0.08485, over 15954.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001386, whisper_loss=0.09055, over 3834096.99 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:28:53,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4982190.0, ans=0.0 2024-08-20 22:28:53,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4982190.0, ans=0.125 2024-08-20 22:28:58,837 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 11 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 22:29:04,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4982290.0, ans=0.1 2024-08-20 22:29:15,798 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 22:29:26,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4982390.0, ans=0.07 2024-08-20 22:29:31,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4982390.0, ans=0.07 2024-08-20 22:29:46,760 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 22:29:47,632 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-20 22:29:49,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.258e+01 2.506e+01 2.833e+01 3.659e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-20 22:29:52,012 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 22:30:15,782 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9300, loss[loss=0.1114, beats_loss=0.008852, ecapa_loss=0.0001289, whisper_loss=0.1013, over 23410.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.000138, whisper_loss=0.09085, over 3841711.66 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:30:20,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4982690.0, ans=0.125 2024-08-20 22:30:35,974 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 22:30:37,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4982790.0, ans=0.2 2024-08-20 22:30:41,639 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-20 22:30:46,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4982790.0, ans=0.125 2024-08-20 22:30:54,651 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4982890.0, ans=0.125 2024-08-20 22:31:13,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4982990.0, ans=0.0 2024-08-20 22:31:26,769 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 20 from LS+wenet, 21 from Vox, 13 fro AS 2024-08-20 22:31:35,770 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 22:31:39,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4983090.0, ans=0.0 2024-08-20 22:31:42,223 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9350, loss[loss=0.1127, beats_loss=0.008906, ecapa_loss=0.0001413, whisper_loss=0.1024, over 19364.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001379, whisper_loss=0.09042, over 3853901.95 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:31:46,935 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4983190.0, ans=0.1 2024-08-20 22:31:58,001 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 22:32:03,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4983290.0, ans=0.0 2024-08-20 22:32:13,798 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.124e+05 2024-08-20 22:32:19,742 WARNING [optim.py:496] (2/4) Scaling gradients by 0.011810386553406715, model_norm_threshold=50.11380386352539 2024-08-20 22:32:19,909 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.320e+06, grad_sumsq=2.158e+08, orig_rms_sq=1.075e-02 2024-08-20 22:32:22,323 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 22:32:22,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.95 vs. limit=22.5 2024-08-20 22:32:22,819 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-08-20 22:32:33,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=12.0 2024-08-20 22:32:35,319 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4983490.0, ans=0.1 2024-08-20 22:32:41,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.343e+01 2.545e+01 2.927e+01 4.243e+03, threshold=5.090e+01, percent-clipped=3.0 2024-08-20 22:32:57,463 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 22:32:59,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4983590.0, ans=0.125 2024-08-20 22:33:06,823 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9400, loss[loss=0.081, beats_loss=0.01069, ecapa_loss=0.0001386, whisper_loss=0.06892, over 15613.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001379, whisper_loss=0.08993, over 3846586.02 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:33:24,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2024-08-20 22:33:44,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4983890.0, ans=0.1 2024-08-20 22:34:16,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4984090.0, ans=0.125 2024-08-20 22:34:33,354 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9450, loss[loss=0.09574, beats_loss=0.01037, ecapa_loss=0.0001312, whisper_loss=0.08406, over 19459.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001368, whisper_loss=0.08952, over 3851697.48 frames. ], batch size: 76, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:34:45,585 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-20 22:34:59,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4984290.0, ans=0.2 2024-08-20 22:35:01,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4984290.0, ans=0.125 2024-08-20 22:35:32,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.335e+01 2.553e+01 2.784e+01 4.072e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 22:35:57,741 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 22:35:58,725 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9500, loss[loss=0.09479, beats_loss=0.009203, ecapa_loss=0.0001653, whisper_loss=0.08393, over 16768.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001382, whisper_loss=0.08962, over 3840904.92 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:36:02,450 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4984690.0, ans=0.125 2024-08-20 22:36:35,115 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 22:36:44,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4984890.0, ans=0.125 2024-08-20 22:36:45,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4984890.0, ans=0.125 2024-08-20 22:36:59,950 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4984990.0, ans=0.0 2024-08-20 22:37:01,071 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 22:37:26,221 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9550, loss[loss=0.1068, beats_loss=0.007535, ecapa_loss=0.0001219, whisper_loss=0.09808, over 16584.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001381, whisper_loss=0.08906, over 3811071.43 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:37:49,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4985290.0, ans=0.07 2024-08-20 22:37:54,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4985290.0, ans=0.1 2024-08-20 22:38:22,444 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 22:38:24,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.187e+01 2.362e+01 2.605e+01 3.929e+01, threshold=4.725e+01, percent-clipped=0.0 2024-08-20 22:38:42,695 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-20 22:38:46,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4985590.0, ans=0.1 2024-08-20 22:38:51,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-08-20 22:38:51,967 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9600, loss[loss=0.09377, beats_loss=0.01043, ecapa_loss=0.0001252, whisper_loss=0.08209, over 18213.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.08946, over 3792511.17 frames. ], batch size: 70, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:38:52,695 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4985690.0, ans=0.125 2024-08-20 22:38:52,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4985690.0, ans=0.1 2024-08-20 22:39:04,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2024-08-20 22:39:08,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4985790.0, ans=0.0 2024-08-20 22:39:27,533 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4985890.0, ans=0.0 2024-08-20 22:39:28,802 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 22:39:30,922 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2024-08-20 22:39:33,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4985890.0, ans=0.09899494936611666 2024-08-20 22:39:55,423 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4985990.0, ans=0.125 2024-08-20 22:40:05,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4986090.0, ans=0.2 2024-08-20 22:40:09,378 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-08-20 22:40:11,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=4986090.0, ans=22.5 2024-08-20 22:40:14,117 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 22:40:21,437 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9650, loss[loss=0.09858, beats_loss=0.0104, ecapa_loss=0.000165, whisper_loss=0.08653, over 16264.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08966, over 3802191.36 frames. ], batch size: 70, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:40:24,763 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 22:40:43,513 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-08-20 22:41:22,168 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.288e+01 2.415e+01 2.707e+01 3.907e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-20 22:41:32,765 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 28 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 22:41:48,630 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9700, loss[loss=0.1056, beats_loss=0.01292, ecapa_loss=0.0001119, whisper_loss=0.09151, over 22553.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001386, whisper_loss=0.08897, over 3804557.11 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:41:54,045 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 22:41:57,360 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 22:42:04,215 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 22:42:40,296 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 12 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 22:42:42,895 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4986990.0, ans=0.125 2024-08-20 22:42:45,247 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 22:43:06,849 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.681e+05 2024-08-20 22:43:15,063 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9750, loss[loss=0.0738, beats_loss=0.01454, ecapa_loss=0.0001031, whisper_loss=0.05822, over 17242.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.0001375, whisper_loss=0.08865, over 3795863.42 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:43:15,304 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 21 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 22:43:15,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4987190.0, ans=0.125 2024-08-20 22:43:34,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4987290.0, ans=0.2 2024-08-20 22:44:00,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4987390.0, ans=0.2 2024-08-20 22:44:09,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4987490.0, ans=0.125 2024-08-20 22:44:15,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.197e+01 2.422e+01 2.722e+01 3.881e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-20 22:44:21,116 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 22:44:36,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4987590.0, ans=0.1 2024-08-20 22:44:41,303 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9800, loss[loss=0.07295, beats_loss=0.01237, ecapa_loss=0.0001546, whisper_loss=0.05904, over 22423.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01053, ecapa_loss=0.0001384, whisper_loss=0.08829, over 3794967.17 frames. ], batch size: 96, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:44:41,493 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 29 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 22:44:55,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4987690.0, ans=0.2 2024-08-20 22:44:59,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4987790.0, ans=0.125 2024-08-20 22:45:07,342 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2024-08-20 22:45:08,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4987790.0, ans=0.125 2024-08-20 22:45:25,812 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 22:45:26,123 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4987890.0, ans=0.05 2024-08-20 22:45:32,848 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 22:45:48,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2024-08-20 22:45:52,032 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 22:46:06,812 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9850, loss[loss=0.1339, beats_loss=0.009195, ecapa_loss=0.0001587, whisper_loss=0.1231, over 22685.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08883, over 3795437.12 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:46:16,025 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 22:46:21,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4988190.0, ans=0.0 2024-08-20 22:46:43,773 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-20 22:46:52,181 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 10 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 22:46:59,411 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 22:47:07,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.349e+01 2.590e+01 2.982e+01 4.203e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-20 22:47:11,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4988490.0, ans=10.0 2024-08-20 22:47:19,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4988590.0, ans=0.07 2024-08-20 22:47:21,236 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4988590.0, ans=0.0 2024-08-20 22:47:23,723 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 22:47:24,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4988590.0, ans=0.0 2024-08-20 22:47:34,503 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9900, loss[loss=0.09952, beats_loss=0.01017, ecapa_loss=0.0001397, whisper_loss=0.08795, over 18405.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.08906, over 3795281.24 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:47:34,758 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 22:47:36,005 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-20 22:48:06,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4988790.0, ans=0.0 2024-08-20 22:48:07,972 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 22:48:13,595 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 22:48:24,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4988890.0, ans=0.0 2024-08-20 22:48:32,490 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 22:48:37,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4988990.0, ans=0.1 2024-08-20 22:49:01,537 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 9950, loss[loss=0.07904, beats_loss=0.01273, ecapa_loss=9.775e-05, whisper_loss=0.06533, over 16384.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001384, whisper_loss=0.08892, over 3805004.83 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:49:13,299 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 28 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-20 22:49:19,955 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4989290.0, ans=0.0 2024-08-20 22:49:21,978 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.067e+05 2024-08-20 22:49:28,410 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 22:49:30,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4989290.0, ans=0.1 2024-08-20 22:49:40,020 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4989390.0, ans=10.0 2024-08-20 22:49:54,311 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 22:50:02,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.629e+01 2.261e+01 2.497e+01 2.811e+01 6.221e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-20 22:50:15,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4989590.0, ans=0.0 2024-08-20 22:50:24,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4989590.0, ans=0.0 2024-08-20 22:50:28,713 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10000, loss[loss=0.09303, beats_loss=0.01083, ecapa_loss=0.0001172, whisper_loss=0.08103, over 14429.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.0001381, whisper_loss=0.08867, over 3777239.07 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:50:40,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4989690.0, ans=0.125 2024-08-20 22:51:14,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-20 22:51:56,206 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-20 22:51:57,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:51:58,426 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10050, loss[loss=0.1097, beats_loss=0.01027, ecapa_loss=0.0001472, whisper_loss=0.09793, over 21953.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.08996, over 3822536.63 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:52:09,693 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-20 22:52:10,062 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=8.0 2024-08-20 22:52:11,417 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 22:52:12,449 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07415100187063217, model_norm_threshold=49.94480514526367 2024-08-20 22:52:12,617 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.662e+05, grad_sumsq=1.662e+05, orig_rms_sq=1.000e+00 2024-08-20 22:52:27,234 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 22:52:29,884 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-20 22:52:40,984 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 19 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-20 22:52:46,211 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 22:52:57,078 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=12.0 2024-08-20 22:53:00,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.315e+01 2.541e+01 2.876e+01 6.736e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-20 22:53:16,243 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-08-20 22:53:18,063 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 22:53:20,689 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-20 22:53:21,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4990590.0, ans=0.125 2024-08-20 22:53:28,303 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10100, loss[loss=0.1029, beats_loss=0.007814, ecapa_loss=0.000182, whisper_loss=0.09329, over 13328.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.09057, over 3838132.84 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:53:34,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4990690.0, ans=0.125 2024-08-20 22:53:58,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4990790.0, ans=0.125 2024-08-20 22:54:06,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4990890.0, ans=0.0 2024-08-20 22:54:36,822 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 22:54:42,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4991090.0, ans=0.0 2024-08-20 22:54:47,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4991090.0, ans=0.125 2024-08-20 22:54:49,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4991090.0, ans=0.125 2024-08-20 22:54:54,996 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10150, loss[loss=0.1103, beats_loss=0.008903, ecapa_loss=0.0001478, whisper_loss=0.09992, over 17915.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.08975, over 3829080.87 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:55:07,577 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 22:55:19,915 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 22:55:20,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4991290.0, ans=0.125 2024-08-20 22:55:28,805 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 22:55:41,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4991390.0, ans=0.125 2024-08-20 22:55:48,566 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 22:55:56,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.245e+01 2.529e+01 2.822e+01 1.463e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 22:56:00,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4991490.0, ans=0.125 2024-08-20 22:56:05,276 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 22:56:13,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4991590.0, ans=0.95 2024-08-20 22:56:14,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4991590.0, ans=0.125 2024-08-20 22:56:20,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4991590.0, ans=0.2 2024-08-20 22:56:22,412 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10200, loss[loss=0.09503, beats_loss=0.01169, ecapa_loss=0.0001303, whisper_loss=0.08204, over 19741.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.0898, over 3839396.69 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:56:38,651 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 22:56:52,763 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 22:57:33,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4992090.0, ans=0.125 2024-08-20 22:57:48,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.55 vs. limit=12.0 2024-08-20 22:57:54,886 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10250, loss[loss=0.09106, beats_loss=0.0113, ecapa_loss=0.0001213, whisper_loss=0.07854, over 12859.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.08972, over 3814566.97 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:58:00,070 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-08-20 22:58:10,004 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 22:58:55,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4992490.0, ans=0.0 2024-08-20 22:58:55,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4992490.0, ans=0.0 2024-08-20 22:58:55,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4992490.0, ans=0.125 2024-08-20 22:58:55,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4992490.0, ans=15.0 2024-08-20 22:58:59,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.301e+01 2.546e+01 2.793e+01 3.961e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 22:59:18,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4992590.0, ans=0.125 2024-08-20 22:59:18,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4992590.0, ans=0.125 2024-08-20 22:59:26,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-20 22:59:26,744 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10300, loss[loss=0.08819, beats_loss=0.01109, ecapa_loss=0.0001206, whisper_loss=0.0759, over 15614.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.000138, whisper_loss=0.08947, over 3784214.49 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:59:27,310 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4992690.0, ans=0.5 2024-08-20 23:00:07,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4992890.0, ans=0.125 2024-08-20 23:00:19,152 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 14 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 23:00:24,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-20 23:00:55,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4993090.0, ans=0.125 2024-08-20 23:00:58,098 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10350, loss[loss=0.09539, beats_loss=0.01036, ecapa_loss=0.0001157, whisper_loss=0.08387, over 19682.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001376, whisper_loss=0.08935, over 3841391.72 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:01:02,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4993190.0, ans=0.0 2024-08-20 23:01:17,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-20 23:01:18,056 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 23:01:28,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-20 23:01:42,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4993390.0, ans=0.125 2024-08-20 23:01:53,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4993490.0, ans=0.025 2024-08-20 23:02:00,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4993490.0, ans=0.0 2024-08-20 23:02:01,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.402e+01 2.644e+01 3.028e+01 6.335e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-20 23:02:03,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4993490.0, ans=0.125 2024-08-20 23:02:07,626 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4993490.0, ans=0.2 2024-08-20 23:02:24,577 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4993590.0, ans=0.125 2024-08-20 23:02:24,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4993590.0, ans=0.1 2024-08-20 23:02:29,167 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10400, loss[loss=0.09787, beats_loss=0.01109, ecapa_loss=0.0001196, whisper_loss=0.08559, over 20181.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001375, whisper_loss=0.08972, over 3820723.27 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:02:32,076 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.922e-01 2024-08-20 23:02:33,063 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 23:02:33,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4993690.0, ans=0.125 2024-08-20 23:02:38,027 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 23:02:38,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4993690.0, ans=0.125 2024-08-20 23:02:56,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4993790.0, ans=0.125 2024-08-20 23:03:01,085 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-08-20 23:03:09,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4993890.0, ans=0.1 2024-08-20 23:03:42,435 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4994090.0, ans=0.0 2024-08-20 23:03:59,833 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10450, loss[loss=0.08971, beats_loss=0.01025, ecapa_loss=0.000106, whisper_loss=0.0784, over 22097.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001383, whisper_loss=0.08958, over 3805565.24 frames. ], batch size: 84, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:04:11,127 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4994190.0, ans=0.0 2024-08-20 23:04:54,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4994490.0, ans=0.125 2024-08-20 23:04:54,509 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.376e+01 2024-08-20 23:04:55,401 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 28 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 23:05:01,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2024-08-20 23:05:04,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.282e+01 2.461e+01 2.662e+01 8.138e+01, threshold=4.922e+01, percent-clipped=1.0 2024-08-20 23:05:26,148 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-20 23:05:31,035 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10500, loss[loss=0.1018, beats_loss=0.008852, ecapa_loss=0.0001896, whisper_loss=0.09106, over 18820.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001387, whisper_loss=0.08915, over 3807991.38 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:05:33,259 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 23:05:58,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2024-08-20 23:06:15,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4994890.0, ans=0.1 2024-08-20 23:06:25,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2024-08-20 23:06:46,892 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 23:06:58,689 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10550, loss[loss=0.08907, beats_loss=0.01267, ecapa_loss=0.0001282, whisper_loss=0.07512, over 22677.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08941, over 3792598.32 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:07:19,356 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 23:07:29,705 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 36 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 23:07:30,295 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2024-08-20 23:07:59,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.287e+01 2.503e+01 2.760e+01 4.390e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-20 23:08:17,322 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 23:08:25,618 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10600, loss[loss=0.09298, beats_loss=0.01, ecapa_loss=0.0001329, whisper_loss=0.08165, over 18996.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.08921, over 3757443.93 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:08:55,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-20 23:09:00,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4995790.0, ans=0.125 2024-08-20 23:09:04,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-08-20 23:09:15,257 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4995890.0, ans=0.125 2024-08-20 23:09:59,560 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10650, loss[loss=0.09852, beats_loss=0.01093, ecapa_loss=0.0001109, whisper_loss=0.08647, over 14010.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.08946, over 3745183.64 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:10:00,292 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.222e+00 2024-08-20 23:10:08,861 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 32 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 23:10:12,136 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 23:10:38,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4996390.0, ans=0.0 2024-08-20 23:10:43,156 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-08-20 23:10:46,696 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 23:11:06,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.258e+01 2.484e+01 2.690e+01 6.351e+01, threshold=4.969e+01, percent-clipped=1.0 2024-08-20 23:11:08,781 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 23:11:11,820 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 23:11:16,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4996590.0, ans=0.125 2024-08-20 23:11:29,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4996590.0, ans=0.125 2024-08-20 23:11:33,154 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10700, loss[loss=0.1075, beats_loss=0.01065, ecapa_loss=0.0001262, whisper_loss=0.09554, over 22771.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001393, whisper_loss=0.0895, over 3770549.21 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:11:39,471 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 11 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 23:12:11,979 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 23:12:27,360 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 23:12:32,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-20 23:12:35,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4996990.0, ans=0.0 2024-08-20 23:12:41,426 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4996990.0, ans=0.07 2024-08-20 23:12:49,174 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-20 23:12:59,730 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4997090.0, ans=0.1 2024-08-20 23:13:06,188 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10750, loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001716, whisper_loss=0.09179, over 22296.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08905, over 3759434.55 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:13:13,925 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=8.0 2024-08-20 23:13:17,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4997190.0, ans=0.0 2024-08-20 23:13:25,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4997290.0, ans=0.125 2024-08-20 23:13:44,534 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 10 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 23:13:47,877 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4997390.0, ans=0.125 2024-08-20 23:13:56,564 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-20 23:14:18,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.299e+01 2.568e+01 2.833e+01 1.794e+02, threshold=5.137e+01, percent-clipped=1.0 2024-08-20 23:14:33,329 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4997590.0, ans=0.0 2024-08-20 23:14:35,848 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 23:14:39,731 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 23:14:46,995 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10800, loss[loss=0.1216, beats_loss=0.008455, ecapa_loss=0.0001537, whisper_loss=0.1116, over 22119.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08937, over 3731617.89 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:15:25,598 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 39 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 23:15:47,968 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 15 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 23:15:57,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4997990.0, ans=0.125 2024-08-20 23:16:07,968 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 23:16:20,526 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10850, loss[loss=0.07658, beats_loss=0.01051, ecapa_loss=0.0001555, whisper_loss=0.06452, over 14639.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.0896, over 3751184.61 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:16:25,761 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 23:16:26,573 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4998190.0, ans=0.5 2024-08-20 23:16:42,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4998290.0, ans=0.2 2024-08-20 23:16:42,357 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2024-08-20 23:16:59,601 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4998390.0, ans=0.0 2024-08-20 23:17:12,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4998390.0, ans=0.125 2024-08-20 23:17:21,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4998490.0, ans=0.1 2024-08-20 23:17:24,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.245e+01 2.613e+01 2.967e+01 9.176e+01, threshold=5.227e+01, percent-clipped=1.0 2024-08-20 23:17:27,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2024-08-20 23:17:51,205 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10900, loss[loss=0.07948, beats_loss=0.01364, ecapa_loss=9.98e-05, whisper_loss=0.06485, over 15642.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001406, whisper_loss=0.0905, over 3764192.63 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:18:00,576 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 23:18:10,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:18:13,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:18:20,615 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:19:21,336 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 10950, loss[loss=0.094, beats_loss=0.01086, ecapa_loss=0.0001099, whisper_loss=0.08204, over 21932.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.09005, over 3786010.52 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:19:29,247 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 23:19:52,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=4999290.0, ans=0.02 2024-08-20 23:20:22,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4999490.0, ans=0.05 2024-08-20 23:20:25,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.300e+01 2.546e+01 2.869e+01 3.848e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 23:20:32,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4999490.0, ans=0.125 2024-08-20 23:20:36,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4999590.0, ans=0.07 2024-08-20 23:20:51,483 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 23:20:52,491 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11000, loss[loss=0.08727, beats_loss=0.01096, ecapa_loss=0.0001312, whisper_loss=0.075, over 14511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09058, over 3830068.33 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:21:13,265 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 23:21:33,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4999890.0, ans=0.04949747468305833 2024-08-20 23:21:35,591 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 23:22:17,411 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 23:22:18,865 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 23:22:27,406 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11050, loss[loss=0.09552, beats_loss=0.009802, ecapa_loss=0.0001539, whisper_loss=0.08418, over 16032.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.09086, over 3849685.46 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:22:49,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5000290.0, ans=0.1 2024-08-20 23:23:03,722 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-20 23:23:14,836 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-20 23:23:18,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=5000390.0, ans=15.0 2024-08-20 23:23:23,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2024-08-20 23:23:31,305 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 23:23:32,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.245e+01 2.536e+01 2.821e+01 3.723e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 23:23:56,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5000590.0, ans=0.2 2024-08-20 23:24:01,821 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11100, loss[loss=0.1161, beats_loss=0.009544, ecapa_loss=0.0001272, whisper_loss=0.1053, over 20877.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001381, whisper_loss=0.09055, over 3865280.37 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:24:12,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5000690.0, ans=0.125 2024-08-20 23:24:58,812 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 18 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-20 23:25:07,848 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5000990.0, ans=0.0 2024-08-20 23:25:25,131 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-20 23:25:27,605 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 23:25:32,656 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 23:25:37,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5001090.0, ans=0.0 2024-08-20 23:25:40,220 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11150, loss[loss=0.09145, beats_loss=0.01242, ecapa_loss=0.0001244, whisper_loss=0.07779, over 21065.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01043, ecapa_loss=0.0001375, whisper_loss=0.09192, over 3844102.08 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:26:01,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5001290.0, ans=0.125 2024-08-20 23:26:38,736 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-08-20 23:26:44,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.389e+01 2.664e+01 3.018e+01 8.039e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-20 23:26:45,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-20 23:26:53,361 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 13 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 23:27:14,921 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11200, loss[loss=0.09218, beats_loss=0.01052, ecapa_loss=0.0001348, whisper_loss=0.0803, over 20615.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001374, whisper_loss=0.09146, over 3854430.90 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:27:20,123 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 23:27:37,218 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 23:27:41,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5001790.0, ans=0.5 2024-08-20 23:28:04,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5001890.0, ans=0.0 2024-08-20 23:28:20,387 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 23:28:22,194 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 23:28:35,317 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5002090.0, ans=0.125 2024-08-20 23:28:37,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5002090.0, ans=0.125 2024-08-20 23:28:47,408 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11250, loss[loss=0.1285, beats_loss=0.006074, ecapa_loss=0.0001496, whisper_loss=0.1209, over 15518.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.0001381, whisper_loss=0.09121, over 3870953.64 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:28:49,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5002190.0, ans=0.125 2024-08-20 23:29:39,113 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 20 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-20 23:29:54,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.254e+01 2.512e+01 2.929e+01 3.894e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-20 23:30:16,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5002590.0, ans=0.2 2024-08-20 23:30:22,468 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11300, loss[loss=0.1005, beats_loss=0.008599, ecapa_loss=0.0001625, whisper_loss=0.09023, over 16570.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.09097, over 3845345.43 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:30:27,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5002690.0, ans=0.1 2024-08-20 23:30:30,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-20 23:31:02,768 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.174e+05 2024-08-20 23:31:26,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5002990.0, ans=0.0 2024-08-20 23:31:32,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5002990.0, ans=0.2 2024-08-20 23:31:48,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5003090.0, ans=0.1 2024-08-20 23:31:56,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-20 23:32:09,476 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11350, loss[loss=0.09375, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.08203, over 14140.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001381, whisper_loss=0.09069, over 3824746.07 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:32:42,751 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 23:32:45,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5003390.0, ans=0.0 2024-08-20 23:33:00,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5003390.0, ans=0.125 2024-08-20 23:33:00,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5003390.0, ans=0.125 2024-08-20 23:33:14,656 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 23:33:16,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.267e+01 2.559e+01 2.926e+01 1.468e+02, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 23:33:22,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5003490.0, ans=0.125 2024-08-20 23:33:31,079 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.93 vs. limit=5.0 2024-08-20 23:33:43,994 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11400, loss[loss=0.09401, beats_loss=0.01231, ecapa_loss=0.0001298, whisper_loss=0.0804, over 20041.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.09036, over 3807720.87 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:33:49,278 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 23:33:56,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5003690.0, ans=0.0 2024-08-20 23:34:04,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5003790.0, ans=0.0 2024-08-20 23:34:08,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5003790.0, ans=0.0 2024-08-20 23:34:14,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-08-20 23:34:21,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-20 23:34:28,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5003890.0, ans=0.2 2024-08-20 23:34:36,990 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5003890.0, ans=0.0 2024-08-20 23:35:16,172 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11450, loss[loss=0.08332, beats_loss=0.009958, ecapa_loss=0.0001521, whisper_loss=0.07184, over 13904.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.09031, over 3801303.64 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:35:21,276 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 23:35:30,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5004190.0, ans=0.0 2024-08-20 23:35:38,182 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-20 23:35:39,711 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5004290.0, ans=0.125 2024-08-20 23:35:48,814 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5004290.0, ans=0.0 2024-08-20 23:35:56,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5004390.0, ans=0.0 2024-08-20 23:36:01,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5004390.0, ans=0.125 2024-08-20 23:36:22,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5004490.0, ans=0.125 2024-08-20 23:36:31,464 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.263e+01 2.473e+01 2.920e+01 3.744e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-20 23:36:59,827 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11500, loss[loss=0.1176, beats_loss=0.008323, ecapa_loss=0.0001304, whisper_loss=0.1079, over 17682.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.09006, over 3828418.74 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:37:10,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5004690.0, ans=0.125 2024-08-20 23:37:27,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5004790.0, ans=0.05 2024-08-20 23:37:46,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5004890.0, ans=0.0 2024-08-20 23:37:59,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5004990.0, ans=10.0 2024-08-20 23:38:09,424 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:10,613 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 23:38:38,281 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11550, loss[loss=0.09477, beats_loss=0.01113, ecapa_loss=0.0001403, whisper_loss=0.08224, over 23827.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09043, over 3820815.55 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:38:39,599 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5005190.0, ans=0.1 2024-08-20 23:38:42,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.27 vs. limit=12.0 2024-08-20 23:38:49,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5005190.0, ans=0.2 2024-08-20 23:38:51,850 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2024-08-20 23:39:04,224 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 23:39:27,825 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=5005390.0, ans=0.5 2024-08-20 23:39:29,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5005390.0, ans=0.125 2024-08-20 23:39:45,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.283e+01 2.508e+01 2.789e+01 4.307e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 23:39:46,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5005490.0, ans=0.125 2024-08-20 23:39:55,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5005590.0, ans=0.125 2024-08-20 23:40:16,542 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11600, loss[loss=0.1084, beats_loss=0.009013, ecapa_loss=0.0001376, whisper_loss=0.09806, over 22230.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001394, whisper_loss=0.09079, over 3839627.66 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:40:41,923 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5005790.0, ans=0.125 2024-08-20 23:40:51,859 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-20 23:41:14,532 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5005990.0, ans=0.125 2024-08-20 23:41:30,962 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:41:36,491 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-20 23:41:43,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5006090.0, ans=0.0 2024-08-20 23:41:49,170 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 23:41:50,157 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11650, loss[loss=0.09688, beats_loss=0.01141, ecapa_loss=0.0001726, whisper_loss=0.08374, over 21235.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001389, whisper_loss=0.09045, over 3807452.27 frames. ], batch size: 95, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:41:50,344 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 23:41:54,085 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 23:42:09,223 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-20 23:42:09,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5006290.0, ans=0.125 2024-08-20 23:42:48,689 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 17 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-20 23:42:53,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.472e+01 2.706e+01 2.968e+01 4.797e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-20 23:43:06,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5006590.0, ans=0.0 2024-08-20 23:43:09,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5006590.0, ans=0.125 2024-08-20 23:43:16,871 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 23:43:21,204 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11700, loss[loss=0.1055, beats_loss=0.009556, ecapa_loss=0.0001724, whisper_loss=0.09424, over 22813.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.09023, over 3818600.69 frames. ], batch size: 96, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:43:25,145 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5006690.0, ans=0.0 2024-08-20 23:43:38,298 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5006690.0, ans=0.125 2024-08-20 23:44:13,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5006890.0, ans=0.2 2024-08-20 23:44:41,204 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 23:44:51,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5007090.0, ans=0.125 2024-08-20 23:44:55,459 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 23:44:56,027 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5007190.0, ans=0.125 2024-08-20 23:44:57,130 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11750, loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.0001091, whisper_loss=0.09102, over 17106.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001378, whisper_loss=0.09015, over 3824655.89 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:45:06,506 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.943e+00 2024-08-20 23:45:16,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5007290.0, ans=0.05 2024-08-20 23:45:35,602 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08465281873941422, model_norm_threshold=54.12553405761719 2024-08-20 23:45:35,772 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.07, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.880e+04, grad_sumsq=2.880e+04, orig_rms_sq=1.000e+00 2024-08-20 23:45:37,853 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 23:46:01,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.339e+01 2.549e+01 2.970e+01 6.394e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-20 23:46:12,267 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 23:46:16,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.86 vs. limit=6.0 2024-08-20 23:46:17,675 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 23:46:32,072 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11800, loss[loss=0.1174, beats_loss=0.008686, ecapa_loss=0.0001536, whisper_loss=0.1071, over 18909.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09045, over 3878227.09 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:46:33,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5007690.0, ans=0.0 2024-08-20 23:46:36,140 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 23:47:05,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5007790.0, ans=0.2 2024-08-20 23:47:24,588 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5007890.0, ans=0.1 2024-08-20 23:47:26,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5007890.0, ans=0.05 2024-08-20 23:47:44,112 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5007990.0, ans=0.1 2024-08-20 23:47:45,445 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 23:47:57,082 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 23:48:01,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5008090.0, ans=0.1 2024-08-20 23:48:11,481 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11850, loss[loss=0.09138, beats_loss=0.007754, ecapa_loss=0.0001523, whisper_loss=0.0821, over 16502.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001384, whisper_loss=0.09028, over 3910848.19 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:48:24,541 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 23:48:37,724 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 23:49:09,534 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2024-08-20 23:49:20,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.312e+01 2.598e+01 2.861e+01 4.202e+01, threshold=5.196e+01, percent-clipped=0.0 2024-08-20 23:49:22,421 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 23:49:51,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11900, loss[loss=0.1121, beats_loss=0.00903, ecapa_loss=0.0001427, whisper_loss=0.1016, over 22051.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001384, whisper_loss=0.08987, over 3893950.36 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:50:20,242 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.20 vs. limit=22.5 2024-08-20 23:50:24,619 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.291e+00 2024-08-20 23:50:37,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5008890.0, ans=0.125 2024-08-20 23:51:06,428 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 21 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 23:51:29,139 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 11950, loss[loss=0.1054, beats_loss=0.01144, ecapa_loss=0.0001342, whisper_loss=0.09261, over 23132.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.0001392, whisper_loss=0.08922, over 3879500.48 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:51:33,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-20 23:51:41,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5009190.0, ans=0.0 2024-08-20 23:51:51,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5009190.0, ans=0.0 2024-08-20 23:52:07,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5009290.0, ans=0.1 2024-08-20 23:52:09,286 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 21 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 23:52:20,303 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 21 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-20 23:52:22,211 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:52:23,930 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 23:52:27,159 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 23:52:27,358 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5009390.0, ans=0.125 2024-08-20 23:52:40,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.309e+01 2.519e+01 2.758e+01 3.846e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 23:52:41,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5009490.0, ans=0.0 2024-08-20 23:52:47,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5009490.0, ans=0.1 2024-08-20 23:53:07,263 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12000, loss[loss=0.09029, beats_loss=0.009727, ecapa_loss=0.0001778, whisper_loss=0.07879, over 16060.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01065, ecapa_loss=0.0001385, whisper_loss=0.08919, over 3886775.14 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:53:07,264 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-20 23:53:44,001 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2522, over 931116.00 frames. 2024-08-20 23:54:09,225 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on SV_voxceleb1: loss=0.003964, beats_loss=0, ecapa_loss=0.0003964, whisper_loss=0, over 944235.00 frames. 2024-08-20 23:55:45,798 INFO [train_multi_KD3.py:1150] (2/4) Epoch 34, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 23:55:45,802 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-20 23:55:54,437 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5009690.0, ans=0.125 2024-08-20 23:56:00,899 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 23:56:04,579 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5009790.0, ans=0.07 2024-08-20 23:56:20,406 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 11 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-20 23:56:36,572 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 23:56:40,538 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-08-20 23:56:41,856 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-20 23:56:45,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 23:56:51,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-20 23:57:11,222 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12050, loss[loss=0.1065, beats_loss=0.009976, ecapa_loss=0.0001328, whisper_loss=0.09515, over 21884.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.000138, whisper_loss=0.08961, over 3871584.95 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:57:17,118 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 23:57:42,792 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5010290.0, ans=0.0 2024-08-20 23:57:42,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5010290.0, ans=0.125 2024-08-20 23:57:46,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5010290.0, ans=0.0 2024-08-20 23:58:07,620 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 18 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-20 23:58:14,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.275e+01 2.516e+01 2.859e+01 1.031e+02, threshold=5.032e+01, percent-clipped=2.0 2024-08-20 23:58:17,015 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5010490.0, ans=0.125 2024-08-20 23:58:19,625 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 23:58:23,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5010590.0, ans=0.125 2024-08-20 23:58:29,876 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5010590.0, ans=0.125 2024-08-20 23:58:39,943 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12100, loss[loss=0.116, beats_loss=0.009939, ecapa_loss=0.0001206, whisper_loss=0.1049, over 19043.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.0001387, whisper_loss=0.08976, over 3838199.36 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:58:41,528 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 23:58:44,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5010690.0, ans=0.125 2024-08-20 23:58:47,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-20 23:58:52,047 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 23:58:58,085 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 23:59:01,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5010790.0, ans=0.0 2024-08-20 23:59:32,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-08-20 23:59:39,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5010990.0, ans=0.125 2024-08-20 23:59:49,084 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 00:00:06,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5011090.0, ans=0.125 2024-08-21 00:00:12,311 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12150, loss[loss=0.1132, beats_loss=0.007849, ecapa_loss=0.0001611, whisper_loss=0.1038, over 22358.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001389, whisper_loss=0.08973, over 3827190.81 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:00:19,901 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 00:00:22,464 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-21 00:00:32,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5011290.0, ans=0.1 2024-08-21 00:00:35,642 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5011290.0, ans=0.125 2024-08-21 00:00:41,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5011290.0, ans=0.125 2024-08-21 00:00:55,855 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5011390.0, ans=0.2 2024-08-21 00:01:13,215 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 00:01:18,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.230e+01 2.539e+01 2.959e+01 2.449e+02, threshold=5.079e+01, percent-clipped=2.0 2024-08-21 00:01:18,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5011490.0, ans=0.09899494936611666 2024-08-21 00:01:18,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5011490.0, ans=0.125 2024-08-21 00:01:21,355 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-21 00:01:29,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5011590.0, ans=0.2 2024-08-21 00:01:38,258 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 00:01:46,156 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12200, loss[loss=0.0986, beats_loss=0.01169, ecapa_loss=0.0001133, whisper_loss=0.08577, over 20430.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.08938, over 3777248.51 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:02:24,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5011790.0, ans=0.2 2024-08-21 00:02:44,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2024-08-21 00:02:44,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-21 00:02:59,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5011990.0, ans=0.125 2024-08-21 00:03:08,297 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 00:03:10,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5011990.0, ans=0.125 2024-08-21 00:03:15,606 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5012090.0, ans=0.2 2024-08-21 00:03:21,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5012090.0, ans=0.125 2024-08-21 00:03:24,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5012090.0, ans=0.2 2024-08-21 00:03:24,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.0 2024-08-21 00:03:33,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=12.0 2024-08-21 00:03:34,224 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12250, loss[loss=0.1004, beats_loss=0.0118, ecapa_loss=0.0001747, whisper_loss=0.08681, over 17914.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001395, whisper_loss=0.08946, over 3783389.82 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:03:45,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5012190.0, ans=0.125 2024-08-21 00:04:03,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5012290.0, ans=0.125 2024-08-21 00:04:31,452 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5012490.0, ans=0.125 2024-08-21 00:04:37,667 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 00:04:38,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.216e+01 2.475e+01 2.860e+01 1.621e+02, threshold=4.950e+01, percent-clipped=3.0 2024-08-21 00:04:55,059 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 32 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 00:05:03,160 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.525e+00 2024-08-21 00:05:05,780 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12300, loss[loss=0.1002, beats_loss=0.01139, ecapa_loss=0.0001178, whisper_loss=0.08766, over 17462.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08938, over 3782994.02 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:05:16,576 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 26 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 00:05:20,425 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 00:05:41,006 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 28 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 00:05:42,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5012890.0, ans=0.2 2024-08-21 00:05:47,879 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-08-21 00:05:57,200 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.589e-02 2024-08-21 00:06:05,707 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 00:06:22,524 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5012990.0, ans=0.09899494936611666 2024-08-21 00:06:25,305 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-21 00:06:33,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=5013090.0, ans=0.05 2024-08-21 00:06:43,074 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12350, loss[loss=0.08916, beats_loss=0.0108, ecapa_loss=0.0001156, whisper_loss=0.0772, over 19599.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.08929, over 3792892.44 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:06:52,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-21 00:07:08,069 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0743364468216896, model_norm_threshold=49.50318145751953 2024-08-21 00:07:08,240 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.289e+04, grad_sumsq=4.289e+04, orig_rms_sq=1.000e+00 2024-08-21 00:07:26,092 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 20 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-21 00:07:28,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5013390.0, ans=0.125 2024-08-21 00:07:37,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5013490.0, ans=0.2 2024-08-21 00:07:45,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.283e+01 2.548e+01 2.937e+01 6.659e+02, threshold=5.096e+01, percent-clipped=4.0 2024-08-21 00:07:54,357 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-21 00:07:59,401 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 00:08:12,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12400, loss[loss=0.1126, beats_loss=0.007786, ecapa_loss=0.0001369, whisper_loss=0.1035, over 15260.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.000138, whisper_loss=0.08902, over 3766789.38 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:08:31,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5013690.0, ans=0.1 2024-08-21 00:08:50,612 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-21 00:09:10,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5013990.0, ans=0.125 2024-08-21 00:09:15,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5013990.0, ans=0.2 2024-08-21 00:09:28,718 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-21 00:09:47,375 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12450, loss[loss=0.08767, beats_loss=0.009004, ecapa_loss=0.000156, whisper_loss=0.07711, over 19338.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001383, whisper_loss=0.08881, over 3797891.25 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:10:16,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-08-21 00:10:25,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5014390.0, ans=0.0 2024-08-21 00:10:35,678 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 00:10:39,230 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=22.5 2024-08-21 00:10:51,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.484e+01 2.743e+01 3.672e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-21 00:10:59,339 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-21 00:11:06,067 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5014590.0, ans=0.1 2024-08-21 00:11:06,094 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.592e+00 2024-08-21 00:11:12,071 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 00:11:19,805 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12500, loss[loss=0.1318, beats_loss=0.008579, ecapa_loss=0.000121, whisper_loss=0.122, over 19120.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001368, whisper_loss=0.08936, over 3827090.51 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:11:38,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5014690.0, ans=0.2 2024-08-21 00:11:41,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.16 vs. limit=10.0 2024-08-21 00:11:44,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5014790.0, ans=0.125 2024-08-21 00:11:50,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5014790.0, ans=0.2 2024-08-21 00:11:55,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5014790.0, ans=0.1 2024-08-21 00:11:59,819 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 19 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-21 00:12:12,427 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 00:12:12,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5014890.0, ans=0.0 2024-08-21 00:12:18,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-21 00:12:25,747 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=5014990.0, ans=22.5 2024-08-21 00:12:37,116 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-21 00:12:40,040 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-21 00:12:55,673 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12550, loss[loss=0.1012, beats_loss=0.009289, ecapa_loss=0.0001565, whisper_loss=0.09035, over 21755.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001369, whisper_loss=0.08971, over 3811652.90 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:13:06,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5015190.0, ans=0.2 2024-08-21 00:13:15,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5015290.0, ans=0.125 2024-08-21 00:13:19,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5015290.0, ans=0.125 2024-08-21 00:13:30,242 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-21 00:13:37,711 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 25 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-21 00:13:44,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5015390.0, ans=0.125 2024-08-21 00:13:51,667 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5015490.0, ans=0.125 2024-08-21 00:14:00,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.267e+01 2.470e+01 2.808e+01 4.015e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-21 00:14:08,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=12.0 2024-08-21 00:14:08,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-21 00:14:17,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5015590.0, ans=0.125 2024-08-21 00:14:28,237 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12600, loss[loss=0.07778, beats_loss=0.0128, ecapa_loss=0.0001102, whisper_loss=0.06388, over 15039.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001368, whisper_loss=0.08972, over 3804119.27 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:14:29,199 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-08-21 00:14:36,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5015690.0, ans=0.05 2024-08-21 00:14:37,923 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 00:14:39,546 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 00:14:44,734 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2024-08-21 00:14:49,320 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 00:15:13,442 WARNING [optim.py:496] (2/4) Scaling gradients by 0.00705720903351903, model_norm_threshold=49.39711380004883 2024-08-21 00:15:13,608 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.280e+06, grad_sumsq=7.678e+08, orig_rms_sq=1.078e-02 2024-08-21 00:15:22,965 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 38 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 00:15:23,505 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-21 00:15:34,971 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 00:16:01,766 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12650, loss[loss=0.1046, beats_loss=0.008299, ecapa_loss=0.0001943, whisper_loss=0.09436, over 19853.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001372, whisper_loss=0.08937, over 3786817.47 frames. ], batch size: 85, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:16:18,016 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 00:16:21,050 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2024-08-21 00:16:43,701 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5016390.0, ans=0.125 2024-08-21 00:17:07,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.324e+01 2.529e+01 2.803e+01 7.000e+03, threshold=5.059e+01, percent-clipped=4.0 2024-08-21 00:17:09,877 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-21 00:17:18,914 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 00:17:26,899 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:17:36,740 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 00:17:38,227 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12700, loss[loss=0.07726, beats_loss=0.01233, ecapa_loss=0.0001486, whisper_loss=0.06344, over 13269.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08924, over 3799005.28 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:17:47,387 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5016690.0, ans=0.125 2024-08-21 00:17:49,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5016690.0, ans=0.1 2024-08-21 00:17:57,241 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 00:18:03,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-21 00:18:19,682 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 00:18:34,667 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-21 00:18:35,395 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2024-08-21 00:18:45,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5016990.0, ans=0.2 2024-08-21 00:18:52,378 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 00:19:11,153 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12750, loss[loss=0.09426, beats_loss=0.01031, ecapa_loss=0.0001258, whisper_loss=0.0827, over 17453.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.08949, over 3790888.03 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:19:18,300 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 00:19:34,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5017290.0, ans=0.0 2024-08-21 00:19:44,865 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 00:19:54,521 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 00:19:55,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5017390.0, ans=0.125 2024-08-21 00:20:00,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5017390.0, ans=0.0 2024-08-21 00:20:19,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.263e+01 2.506e+01 2.738e+01 4.032e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-21 00:20:24,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5017490.0, ans=0.125 2024-08-21 00:20:33,904 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5017590.0, ans=0.04949747468305833 2024-08-21 00:20:45,831 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5017690.0, ans=0.015 2024-08-21 00:20:46,989 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12800, loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001309, whisper_loss=0.09234, over 22180.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001383, whisper_loss=0.09065, over 3809248.25 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:20:52,833 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 00:20:54,058 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-08-21 00:21:16,606 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-21 00:22:07,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5018090.0, ans=0.125 2024-08-21 00:22:09,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5018090.0, ans=0.125 2024-08-21 00:22:28,810 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.14 vs. limit=10.0 2024-08-21 00:22:29,461 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12850, loss[loss=0.1064, beats_loss=0.008509, ecapa_loss=0.0001413, whisper_loss=0.09648, over 14073.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0103, ecapa_loss=0.0001388, whisper_loss=0.09107, over 3787803.02 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:22:42,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5018190.0, ans=0.07 2024-08-21 00:22:42,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5018190.0, ans=0.125 2024-08-21 00:23:14,137 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 00:23:30,138 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 00:23:30,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5018490.0, ans=0.125 2024-08-21 00:23:38,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.519e+01 2.755e+01 3.962e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 00:23:43,336 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-21 00:23:53,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5018590.0, ans=0.125 2024-08-21 00:24:05,466 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 19 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-21 00:24:10,687 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 00:24:12,762 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12900, loss[loss=0.107, beats_loss=0.008923, ecapa_loss=0.0001133, whisper_loss=0.09694, over 14857.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001381, whisper_loss=0.09108, over 3834150.63 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:24:26,153 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-08-21 00:24:53,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-08-21 00:24:53,993 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-21 00:25:27,291 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.024e+00 2024-08-21 00:25:35,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5019090.0, ans=0.125 2024-08-21 00:25:40,534 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 30 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 00:25:48,068 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 12950, loss[loss=0.1107, beats_loss=0.01081, ecapa_loss=0.0001202, whisper_loss=0.09868, over 23474.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000138, whisper_loss=0.09051, over 3811152.12 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:26:01,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-21 00:26:03,094 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2024-08-21 00:26:59,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.232e+01 2.418e+01 2.707e+01 3.600e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-21 00:27:03,052 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 00:27:33,077 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13000, loss[loss=0.09639, beats_loss=0.008903, ecapa_loss=0.0001581, whisper_loss=0.08591, over 19719.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001384, whisper_loss=0.09016, over 3803998.75 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:27:53,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5019790.0, ans=0.09899494936611666 2024-08-21 00:28:03,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-08-21 00:28:50,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5019990.0, ans=0.125 2024-08-21 00:29:09,431 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 00:29:10,465 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13050, loss[loss=0.0955, beats_loss=0.01084, ecapa_loss=0.0001288, whisper_loss=0.08337, over 15475.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001388, whisper_loss=0.08963, over 3778582.43 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:29:20,448 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 00:29:28,996 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.636e+01 2024-08-21 00:29:40,822 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 00:29:50,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5020390.0, ans=0.1 2024-08-21 00:29:57,860 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5020390.0, ans=0.025 2024-08-21 00:29:59,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-21 00:30:15,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.250e+01 2.442e+01 2.810e+01 6.229e+01, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 00:30:16,648 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5020490.0, ans=0.2 2024-08-21 00:30:19,167 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.24 vs. limit=22.5 2024-08-21 00:30:27,730 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-08-21 00:30:44,118 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13100, loss[loss=0.1079, beats_loss=0.008588, ecapa_loss=0.0001259, whisper_loss=0.09801, over 16043.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08994, over 3786662.42 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:30:45,684 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-21 00:31:15,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2024-08-21 00:31:27,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5020890.0, ans=0.125 2024-08-21 00:31:32,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5020890.0, ans=0.2 2024-08-21 00:31:41,772 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2024-08-21 00:31:49,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2024-08-21 00:31:55,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2024-08-21 00:32:03,971 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 00:32:09,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5021090.0, ans=0.0 2024-08-21 00:32:13,922 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09580767154693604, model_norm_threshold=48.835636138916016 2024-08-21 00:32:14,122 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.692e+04, grad_sumsq=4.692e+04, orig_rms_sq=1.000e+00 2024-08-21 00:32:19,227 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13150, loss[loss=0.08621, beats_loss=0.01322, ecapa_loss=0.0001231, whisper_loss=0.07176, over 23038.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001388, whisper_loss=0.08941, over 3787051.75 frames. ], batch size: 95, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:32:35,600 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 00:33:10,558 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=5021390.0, ans=0.02 2024-08-21 00:33:22,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.398e+01 2.549e+01 2.951e+01 5.097e+02, threshold=5.098e+01, percent-clipped=2.0 2024-08-21 00:33:38,694 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5021590.0, ans=0.95 2024-08-21 00:33:40,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5021590.0, ans=0.2 2024-08-21 00:33:46,317 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 00:33:52,579 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13200, loss[loss=0.1039, beats_loss=0.009025, ecapa_loss=0.0001413, whisper_loss=0.09349, over 23066.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001393, whisper_loss=0.09015, over 3786804.12 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:34:00,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-21 00:34:03,440 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 00:34:16,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5021790.0, ans=0.125 2024-08-21 00:34:19,159 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-21 00:34:56,638 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 00:35:17,118 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-21 00:35:17,923 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 00:35:21,069 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 00:35:27,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5022090.0, ans=0.1 2024-08-21 00:35:28,986 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-21 00:35:34,095 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13250, loss[loss=0.1113, beats_loss=0.01055, ecapa_loss=0.0001268, whisper_loss=0.09948, over 18001.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.09029, over 3799564.29 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:35:37,047 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5022190.0, ans=0.125 2024-08-21 00:35:44,732 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 00:35:45,163 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.30 vs. limit=22.5 2024-08-21 00:35:47,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-08-21 00:35:52,280 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5022290.0, ans=0.0 2024-08-21 00:36:01,150 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-21 00:36:05,855 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-21 00:36:19,210 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 00:36:37,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.371e+01 2.560e+01 2.906e+01 3.702e+02, threshold=5.119e+01, percent-clipped=3.0 2024-08-21 00:36:51,904 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 00:37:00,469 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-08-21 00:37:08,378 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13300, loss[loss=0.08058, beats_loss=0.01064, ecapa_loss=0.0001477, whisper_loss=0.06846, over 14161.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.09022, over 3817944.23 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:37:42,862 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-21 00:37:48,307 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-08-21 00:37:54,735 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 00:38:01,876 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-21 00:38:08,368 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5022990.0, ans=0.025 2024-08-21 00:38:33,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5023090.0, ans=0.125 2024-08-21 00:38:43,913 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13350, loss[loss=0.09171, beats_loss=0.01115, ecapa_loss=0.0001221, whisper_loss=0.07934, over 16753.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001378, whisper_loss=0.08975, over 3815548.32 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:38:48,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5023190.0, ans=0.0 2024-08-21 00:38:56,038 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-21 00:38:57,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5023190.0, ans=0.0 2024-08-21 00:39:05,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5023290.0, ans=0.125 2024-08-21 00:39:05,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5023290.0, ans=0.09899494936611666 2024-08-21 00:39:09,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-21 00:39:12,074 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 00:39:24,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5023390.0, ans=0.0 2024-08-21 00:39:35,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-21 00:39:38,100 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 00:39:47,885 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 00:39:49,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.315e+01 2.502e+01 2.886e+01 3.923e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 00:39:52,134 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 00:39:57,346 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 00:40:08,339 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5023590.0, ans=0.125 2024-08-21 00:40:17,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5023690.0, ans=0.0 2024-08-21 00:40:18,748 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13400, loss[loss=0.1113, beats_loss=0.009479, ecapa_loss=0.0001509, whisper_loss=0.1003, over 17047.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0103, ecapa_loss=0.0001385, whisper_loss=0.09022, over 3780003.94 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:40:28,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5023690.0, ans=0.125 2024-08-21 00:40:33,360 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 26 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-21 00:40:37,934 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 00:40:42,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-08-21 00:41:13,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2024-08-21 00:41:25,102 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5023990.0, ans=0.0 2024-08-21 00:41:28,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5024090.0, ans=0.125 2024-08-21 00:41:46,038 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 16 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 00:41:47,595 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13450, loss[loss=0.08632, beats_loss=0.009418, ecapa_loss=0.0001575, whisper_loss=0.07532, over 15274.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.0001399, whisper_loss=0.09021, over 3777127.85 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:42:12,683 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 00:42:14,274 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 00:42:20,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-21 00:42:38,545 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-21 00:42:51,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.233e+01 2.383e+01 2.688e+01 3.683e+01, threshold=4.765e+01, percent-clipped=0.0 2024-08-21 00:43:02,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5024590.0, ans=0.1 2024-08-21 00:43:09,680 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-21 00:43:22,798 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13500, loss[loss=0.1105, beats_loss=0.008284, ecapa_loss=0.0001335, whisper_loss=0.1009, over 14058.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01027, ecapa_loss=0.0001388, whisper_loss=0.09073, over 3772364.01 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:43:23,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5024690.0, ans=0.125 2024-08-21 00:43:39,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5024790.0, ans=0.125 2024-08-21 00:43:40,525 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5024790.0, ans=0.1 2024-08-21 00:43:40,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5024790.0, ans=0.125 2024-08-21 00:43:42,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5024790.0, ans=0.0 2024-08-21 00:43:48,659 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=5024790.0, ans=0.2 2024-08-21 00:44:18,229 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=12.0 2024-08-21 00:44:40,025 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-21 00:44:42,085 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 00:44:56,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5025090.0, ans=0.09899494936611666 2024-08-21 00:44:59,204 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13550, loss[loss=0.1065, beats_loss=0.01102, ecapa_loss=0.0001289, whisper_loss=0.09422, over 21499.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001379, whisper_loss=0.09045, over 3787811.89 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:45:05,285 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5025190.0, ans=0.0 2024-08-21 00:45:09,848 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 11 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-21 00:45:16,620 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 00:45:55,802 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=15.0 2024-08-21 00:45:58,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5025490.0, ans=0.2 2024-08-21 00:46:01,207 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-21 00:46:01,665 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5025490.0, ans=0.1 2024-08-21 00:46:05,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5025490.0, ans=0.0 2024-08-21 00:46:07,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.237e+01 2.540e+01 2.882e+01 4.860e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 00:46:08,998 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5025490.0, ans=0.0 2024-08-21 00:46:34,837 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13600, loss[loss=0.1205, beats_loss=0.008317, ecapa_loss=0.0001639, whisper_loss=0.1106, over 22345.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01031, ecapa_loss=0.0001389, whisper_loss=0.09012, over 3781915.75 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:46:45,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5025690.0, ans=0.2 2024-08-21 00:46:48,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5025690.0, ans=0.09899494936611666 2024-08-21 00:47:05,342 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5025790.0, ans=0.0 2024-08-21 00:47:14,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5025890.0, ans=0.0 2024-08-21 00:47:15,586 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 15 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-21 00:47:17,466 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-21 00:48:00,910 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-08-21 00:48:09,543 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13650, loss[loss=0.08372, beats_loss=0.01018, ecapa_loss=0.0001327, whisper_loss=0.07221, over 16634.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001383, whisper_loss=0.08981, over 3756888.08 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:48:27,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5026190.0, ans=0.0 2024-08-21 00:48:31,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5026290.0, ans=0.1 2024-08-21 00:48:53,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5026390.0, ans=0.125 2024-08-21 00:49:22,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.304e+01 2.539e+01 2.805e+01 5.664e+01, threshold=5.078e+01, percent-clipped=1.0 2024-08-21 00:49:52,288 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13700, loss[loss=0.0809, beats_loss=0.01023, ecapa_loss=0.0001775, whisper_loss=0.06889, over 21048.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01027, ecapa_loss=0.0001391, whisper_loss=0.08983, over 3731752.66 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:50:07,758 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 00:50:11,287 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 00:50:16,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5026790.0, ans=0.125 2024-08-21 00:50:55,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-21 00:51:01,121 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=5026990.0, ans=0.02 2024-08-21 00:51:23,977 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5027090.0, ans=0.1 2024-08-21 00:51:32,322 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13750, loss[loss=0.08587, beats_loss=0.0102, ecapa_loss=0.0001453, whisper_loss=0.07422, over 21383.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001396, whisper_loss=0.08972, over 3756718.31 frames. ], batch size: 89, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:51:39,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5027190.0, ans=0.1 2024-08-21 00:51:41,853 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5027190.0, ans=0.125 2024-08-21 00:52:03,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5027290.0, ans=0.2 2024-08-21 00:52:16,624 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-21 00:52:22,806 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2024-08-21 00:52:45,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.353e+01 2.699e+01 3.002e+01 5.030e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-21 00:53:06,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-21 00:53:16,614 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13800, loss[loss=0.102, beats_loss=0.01098, ecapa_loss=0.00012, whisper_loss=0.08986, over 22633.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.08922, over 3772570.77 frames. ], batch size: 87, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:53:25,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5027690.0, ans=0.125 2024-08-21 00:53:27,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5027690.0, ans=0.0 2024-08-21 00:53:28,391 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 00:53:36,691 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5027790.0, ans=0.1 2024-08-21 00:53:40,249 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5027790.0, ans=0.2 2024-08-21 00:53:57,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5027890.0, ans=0.125 2024-08-21 00:53:57,521 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-08-21 00:54:05,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5027890.0, ans=0.1 2024-08-21 00:54:15,825 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 00:54:17,443 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 12 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 00:54:27,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-21 00:54:33,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5028090.0, ans=0.0 2024-08-21 00:54:46,644 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 00:54:49,144 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13850, loss[loss=0.1118, beats_loss=0.009819, ecapa_loss=0.0001631, whisper_loss=0.1003, over 18045.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.08942, over 3785147.16 frames. ], batch size: 74, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:55:01,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5028190.0, ans=0.125 2024-08-21 00:55:06,136 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5028190.0, ans=0.1 2024-08-21 00:55:06,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.33 vs. limit=10.0 2024-08-21 00:55:10,611 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5028290.0, ans=10.0 2024-08-21 00:55:41,363 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-21 00:55:58,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.193e+01 2.425e+01 2.661e+01 8.724e+01, threshold=4.850e+01, percent-clipped=1.0 2024-08-21 00:55:59,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5028490.0, ans=0.125 2024-08-21 00:56:01,680 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2024-08-21 00:56:25,535 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13900, loss[loss=0.09243, beats_loss=0.01232, ecapa_loss=0.0001396, whisper_loss=0.07872, over 22097.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.08956, over 3795815.84 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:56:29,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5028690.0, ans=0.125 2024-08-21 00:56:33,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=12.0 2024-08-21 00:56:34,927 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 12 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-21 00:57:08,074 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-21 00:57:12,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5028890.0, ans=0.0 2024-08-21 00:57:22,514 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 00:57:57,550 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 13950, loss[loss=0.1142, beats_loss=0.008504, ecapa_loss=0.0001539, whisper_loss=0.1042, over 22482.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.00014, whisper_loss=0.08961, over 3796173.24 frames. ], batch size: 88, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:57:58,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-21 00:58:17,818 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.419e+00 2024-08-21 00:58:20,497 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5029290.0, ans=0.025 2024-08-21 00:58:32,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5029290.0, ans=0.0 2024-08-21 00:58:37,473 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 00:58:40,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5029390.0, ans=0.2 2024-08-21 00:58:56,745 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5029390.0, ans=0.0 2024-08-21 00:58:58,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5029390.0, ans=0.2 2024-08-21 00:58:59,237 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-08-21 00:59:01,945 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 15 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 00:59:12,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.187e+01 2.492e+01 2.707e+01 4.586e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 00:59:13,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5029490.0, ans=0.2 2024-08-21 00:59:21,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5029590.0, ans=0.0 2024-08-21 00:59:32,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5029590.0, ans=0.0 2024-08-21 00:59:43,122 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14000, loss[loss=0.1273, beats_loss=0.009485, ecapa_loss=0.000137, whisper_loss=0.1165, over 22537.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001382, whisper_loss=0.08936, over 3808298.96 frames. ], batch size: 88, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:59:53,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5029690.0, ans=0.125 2024-08-21 01:00:16,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5029790.0, ans=0.125 2024-08-21 01:00:19,279 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 01:00:26,728 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 01:00:54,329 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-21 01:00:59,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5029990.0, ans=0.04949747468305833 2024-08-21 01:01:03,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5029990.0, ans=0.0 2024-08-21 01:01:30,782 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14050, loss[loss=0.09041, beats_loss=0.009615, ecapa_loss=0.0001776, whisper_loss=0.07902, over 18922.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08927, over 3809543.74 frames. ], batch size: 79, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:01:44,343 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2024-08-21 01:02:00,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5030290.0, ans=0.0 2024-08-21 01:02:21,376 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-21 01:02:22,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5030390.0, ans=0.125 2024-08-21 01:02:35,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=5030490.0, ans=0.5 2024-08-21 01:02:36,562 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-21 01:02:42,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.263e+01 2.516e+01 2.757e+01 1.194e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-21 01:03:05,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5030590.0, ans=0.125 2024-08-21 01:03:13,526 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14100, loss[loss=0.1242, beats_loss=0.009967, ecapa_loss=0.0001287, whisper_loss=0.113, over 22008.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.09012, over 3812234.65 frames. ], batch size: 85, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:03:19,417 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5030690.0, ans=0.125 2024-08-21 01:03:21,322 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5030690.0, ans=0.1 2024-08-21 01:03:49,044 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5030790.0, ans=0.125 2024-08-21 01:04:12,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5030990.0, ans=0.05 2024-08-21 01:04:21,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5030990.0, ans=0.1 2024-08-21 01:04:29,949 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-21 01:04:50,693 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14150, loss[loss=0.1015, beats_loss=0.009102, ecapa_loss=0.0001431, whisper_loss=0.09101, over 16881.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001385, whisper_loss=0.08994, over 3813488.09 frames. ], batch size: 68, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:04:57,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5031190.0, ans=0.0 2024-08-21 01:05:11,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5031290.0, ans=0.125 2024-08-21 01:05:11,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=5031290.0, ans=15.0 2024-08-21 01:06:02,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.285e+01 2.566e+01 2.926e+01 5.021e+02, threshold=5.132e+01, percent-clipped=5.0 2024-08-21 01:06:06,685 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-08-21 01:06:07,132 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 01:06:07,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5031490.0, ans=0.125 2024-08-21 01:06:08,734 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 01:06:16,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:24,757 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:29,144 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=5031590.0, ans=10.0 2024-08-21 01:06:32,888 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 01:06:35,452 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14200, loss[loss=0.09679, beats_loss=0.009273, ecapa_loss=0.000154, whisper_loss=0.08598, over 14193.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001389, whisper_loss=0.08998, over 3792001.40 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:06:41,316 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-21 01:06:43,081 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-21 01:06:58,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=5031790.0, ans=22.5 2024-08-21 01:07:04,826 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 01:07:05,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5031790.0, ans=0.2 2024-08-21 01:07:09,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5031790.0, ans=0.0 2024-08-21 01:07:11,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5031890.0, ans=0.05 2024-08-21 01:07:21,656 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5031890.0, ans=0.0 2024-08-21 01:07:32,620 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 01:07:49,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5031990.0, ans=0.125 2024-08-21 01:07:53,784 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 33 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 01:07:57,868 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5032090.0, ans=0.1 2024-08-21 01:08:03,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5032090.0, ans=0.0 2024-08-21 01:08:09,675 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14250, loss[loss=0.08291, beats_loss=0.01228, ecapa_loss=0.0001173, whisper_loss=0.06946, over 21607.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01064, ecapa_loss=0.0001384, whisper_loss=0.08952, over 3777851.56 frames. ], batch size: 87, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:08:18,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5032190.0, ans=0.125 2024-08-21 01:08:26,364 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 01:08:32,034 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 01:09:04,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5032490.0, ans=0.0 2024-08-21 01:09:10,411 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5032490.0, ans=0.0 2024-08-21 01:09:11,292 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-21 01:09:16,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.235e+01 2.454e+01 2.805e+01 6.761e+01, threshold=4.908e+01, percent-clipped=2.0 2024-08-21 01:09:24,225 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-21 01:09:35,407 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 01:09:38,852 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 01:09:42,151 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14300, loss[loss=0.09099, beats_loss=0.01138, ecapa_loss=0.0001206, whisper_loss=0.0784, over 15094.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.000137, whisper_loss=0.09025, over 3763386.61 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:09:43,178 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5032690.0, ans=0.125 2024-08-21 01:09:46,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5032690.0, ans=0.0 2024-08-21 01:09:59,704 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5032790.0, ans=0.125 2024-08-21 01:10:04,352 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 01:10:05,120 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2024-08-21 01:10:10,427 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=15.0 2024-08-21 01:10:10,459 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-21 01:10:20,401 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-21 01:10:21,573 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:10:33,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5032890.0, ans=0.2 2024-08-21 01:10:34,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5032890.0, ans=0.1 2024-08-21 01:11:04,399 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5033090.0, ans=0.125 2024-08-21 01:11:18,145 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14350, loss[loss=0.1063, beats_loss=0.008888, ecapa_loss=0.0001825, whisper_loss=0.09559, over 15041.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001382, whisper_loss=0.09069, over 3730660.93 frames. ], batch size: 63, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:11:24,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5033190.0, ans=0.0 2024-08-21 01:11:33,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5033190.0, ans=0.125 2024-08-21 01:11:44,244 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5033290.0, ans=0.1 2024-08-21 01:11:57,794 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5033390.0, ans=0.0 2024-08-21 01:12:24,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.295e+01 2.538e+01 2.825e+01 4.751e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 01:12:36,734 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5033590.0, ans=0.2 2024-08-21 01:12:49,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-21 01:12:49,774 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14400, loss[loss=0.08933, beats_loss=0.009252, ecapa_loss=0.0001536, whisper_loss=0.07855, over 17960.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.000138, whisper_loss=0.09047, over 3731389.63 frames. ], batch size: 70, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:13:04,227 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 01:13:17,724 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-21 01:13:31,607 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-21 01:13:37,002 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5033890.0, ans=0.5 2024-08-21 01:13:49,930 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2024-08-21 01:14:00,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5033990.0, ans=0.1 2024-08-21 01:14:02,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5033990.0, ans=0.1 2024-08-21 01:14:32,006 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14450, loss[loss=0.09904, beats_loss=0.01104, ecapa_loss=0.0001403, whisper_loss=0.0866, over 18256.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001377, whisper_loss=0.09022, over 3709490.74 frames. ], batch size: 72, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:14:33,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5034190.0, ans=0.1 2024-08-21 01:14:35,222 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 01:14:45,237 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.481e+01 2024-08-21 01:14:58,315 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 01:15:06,921 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5034290.0, ans=0.1 2024-08-21 01:15:33,210 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5034490.0, ans=0.1 2024-08-21 01:15:34,505 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 01:15:34,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5034490.0, ans=0.125 2024-08-21 01:15:40,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.267e+01 2.442e+01 2.819e+01 1.713e+02, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 01:16:05,792 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14500, loss[loss=0.1006, beats_loss=0.01072, ecapa_loss=0.0001367, whisper_loss=0.08851, over 21036.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001377, whisper_loss=0.08995, over 3727816.94 frames. ], batch size: 87, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:16:33,410 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 01:16:43,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5034890.0, ans=0.125 2024-08-21 01:16:51,424 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-21 01:16:55,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5034890.0, ans=0.125 2024-08-21 01:17:15,850 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 01:17:25,548 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:17:44,085 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14550, loss[loss=0.1073, beats_loss=0.01075, ecapa_loss=0.000121, whisper_loss=0.09534, over 17745.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.000138, whisper_loss=0.09004, over 3750560.05 frames. ], batch size: 69, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:17:53,723 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 01:18:23,421 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5035390.0, ans=0.125 2024-08-21 01:18:28,598 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-21 01:18:41,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=5035490.0, ans=0.5 2024-08-21 01:18:54,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.274e+01 2.528e+01 2.801e+01 4.515e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-21 01:19:01,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5035590.0, ans=0.1 2024-08-21 01:19:15,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5035590.0, ans=0.125 2024-08-21 01:19:21,682 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14600, loss[loss=0.1129, beats_loss=0.008958, ecapa_loss=0.0001522, whisper_loss=0.1024, over 23933.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.09002, over 3788620.18 frames. ], batch size: 95, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:19:42,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5035790.0, ans=0.2 2024-08-21 01:20:04,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5035890.0, ans=0.0 2024-08-21 01:20:04,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-21 01:20:19,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5035990.0, ans=0.0 2024-08-21 01:20:24,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5035990.0, ans=0.0 2024-08-21 01:20:57,012 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14650, loss[loss=0.08883, beats_loss=0.013, ecapa_loss=0.0001097, whisper_loss=0.07473, over 21179.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001378, whisper_loss=0.0897, over 3807682.51 frames. ], batch size: 84, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:20:57,517 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 01:21:03,117 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-21 01:21:23,807 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5036290.0, ans=0.125 2024-08-21 01:21:27,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5036290.0, ans=0.125 2024-08-21 01:21:32,989 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 01:22:06,128 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.288e+01 2.569e+01 2.805e+01 8.601e+01, threshold=5.137e+01, percent-clipped=2.0 2024-08-21 01:22:32,569 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14700, loss[loss=0.08457, beats_loss=0.01269, ecapa_loss=0.0001332, whisper_loss=0.07054, over 22241.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.09043, over 3846721.05 frames. ], batch size: 93, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:22:33,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5036690.0, ans=0.0 2024-08-21 01:22:45,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5036690.0, ans=0.1 2024-08-21 01:23:29,334 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5036990.0, ans=0.125 2024-08-21 01:23:41,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5036990.0, ans=0.125 2024-08-21 01:23:45,769 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 25 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-21 01:23:56,549 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=12.0 2024-08-21 01:24:08,878 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14750, loss[loss=0.1043, beats_loss=0.009142, ecapa_loss=0.0001706, whisper_loss=0.09346, over 22354.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001391, whisper_loss=0.09004, over 3845121.03 frames. ], batch size: 94, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:24:43,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5037290.0, ans=0.2 2024-08-21 01:24:55,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5037390.0, ans=0.0 2024-08-21 01:25:20,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.178e+01 2.451e+01 2.819e+01 4.132e+01, threshold=4.902e+01, percent-clipped=0.0 2024-08-21 01:25:24,271 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 01:25:31,476 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5037590.0, ans=0.5 2024-08-21 01:25:45,616 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 01:25:47,092 INFO [train_multi_KD3.py:1117] (2/4) Epoch 34, batch 14800, loss[loss=0.09582, beats_loss=0.01065, ecapa_loss=0.0001311, whisper_loss=0.08385, over 21464.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.0899, over 3865447.85 frames. ], batch size: 83, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:25:48,172 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5037690.0, ans=0.125 2024-08-21 01:25:52,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5037690.0, ans=0.0 2024-08-21 01:26:25,278 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 0, loss[loss=0.09702, beats_loss=0.01029, ecapa_loss=0.0001483, whisper_loss=0.08524, over 14694.00 frames. ], tot_loss[loss=0.09702, beats_loss=0.01029, ecapa_loss=0.0001483, whisper_loss=0.08524, over 14694.00 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:26:25,278 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-21 01:27:00,109 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2488, over 931116.00 frames. 2024-08-21 01:27:22,045 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2819, 1.5441, 2.2802, 1.5504, 1.5315, 2.3476, 2.2471, 1.7417], device='cuda:2') 2024-08-21 01:27:22,759 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on SV_voxceleb1: loss=0.003936, beats_loss=0, ecapa_loss=0.0003936, whisper_loss=0, over 944235.00 frames. 2024-08-21 01:27:44,775 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2112, 4.0789, 3.7101, 3.9773], device='cuda:2') 2024-08-21 01:28:59,293 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 01:28:59,296 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-21 01:29:18,164 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 01:29:24,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5037850.0, ans=0.125 2024-08-21 01:29:47,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5037850.0, ans=0.125 2024-08-21 01:30:06,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5037950.0, ans=0.125 2024-08-21 01:30:09,542 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 01:30:26,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5038050.0, ans=0.125 2024-08-21 01:30:35,682 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-21 01:30:39,498 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 01:30:39,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5038150.0, ans=0.0 2024-08-21 01:30:40,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-21 01:31:05,020 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 50, loss[loss=0.09862, beats_loss=0.009193, ecapa_loss=0.0001694, whisper_loss=0.08773, over 20656.00 frames. ], tot_loss[loss=0.09981, beats_loss=0.009304, ecapa_loss=0.0001483, whisper_loss=0.08902, over 895311.92 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:31:08,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5038250.0, ans=0.0 2024-08-21 01:31:19,696 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 30 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-21 01:31:21,829 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 01:31:33,041 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5038350.0, ans=0.125 2024-08-21 01:32:07,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5038450.0, ans=0.0 2024-08-21 01:32:18,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.525e+01 2.864e+01 3.213e+01 4.437e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-21 01:33:13,879 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 100, loss[loss=0.0904, beats_loss=0.01005, ecapa_loss=0.0001089, whisper_loss=0.07926, over 18506.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009052, ecapa_loss=0.0001414, whisper_loss=0.08965, over 1530071.86 frames. ], batch size: 68, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:33:15,461 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5038750.0, ans=0.035 2024-08-21 01:33:42,996 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5038850.0, ans=0.125 2024-08-21 01:33:45,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5038850.0, ans=0.125 2024-08-21 01:34:04,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5038950.0, ans=0.2 2024-08-21 01:34:33,840 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 37 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 01:34:38,707 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 01:35:08,892 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:35:19,888 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 150, loss[loss=0.09839, beats_loss=0.01144, ecapa_loss=0.0001085, whisper_loss=0.08587, over 23481.00 frames. ], tot_loss[loss=0.1, beats_loss=0.009149, ecapa_loss=0.0001406, whisper_loss=0.08949, over 2049670.00 frames. ], batch size: 91, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:35:20,070 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 01:35:41,783 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 01:36:01,960 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5039350.0, ans=0.125 2024-08-21 01:36:01,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5039350.0, ans=0.125 2024-08-21 01:36:25,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.462e+01 2.718e+01 2.997e+01 1.008e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-21 01:36:41,049 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-21 01:36:47,732 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-21 01:36:53,518 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5039650.0, ans=0.2 2024-08-21 01:37:07,869 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5039750.0, ans=0.0 2024-08-21 01:37:08,662 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 200, loss[loss=0.09881, beats_loss=0.01103, ecapa_loss=0.0001256, whisper_loss=0.08652, over 16920.00 frames. ], tot_loss[loss=0.09909, beats_loss=0.009476, ecapa_loss=0.0001396, whisper_loss=0.08822, over 2425089.89 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:37:16,979 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5039750.0, ans=0.125 2024-08-21 01:37:17,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5039750.0, ans=0.1 2024-08-21 01:38:10,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5040050.0, ans=0.125 2024-08-21 01:38:20,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.97 vs. limit=22.5 2024-08-21 01:38:30,835 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-21 01:38:41,742 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 250, loss[loss=0.09037, beats_loss=0.01342, ecapa_loss=0.0001289, whisper_loss=0.07566, over 19520.00 frames. ], tot_loss[loss=0.09966, beats_loss=0.009663, ecapa_loss=0.0001395, whisper_loss=0.0886, over 2700718.05 frames. ], batch size: 80, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:38:47,934 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-21 01:38:51,919 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5040250.0, ans=0.125 2024-08-21 01:39:20,050 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5040450.0, ans=0.125 2024-08-21 01:39:25,559 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5040450.0, ans=0.07 2024-08-21 01:39:27,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5040450.0, ans=0.125 2024-08-21 01:39:38,775 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 10 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-21 01:39:39,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.294e+01 2.516e+01 2.828e+01 4.079e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 01:39:40,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5040550.0, ans=0.0 2024-08-21 01:40:07,557 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 16 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 01:40:17,472 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 300, loss[loss=0.0931, beats_loss=0.009408, ecapa_loss=0.0001475, whisper_loss=0.08221, over 14741.00 frames. ], tot_loss[loss=0.09884, beats_loss=0.01001, ecapa_loss=0.0001394, whisper_loss=0.08743, over 2949068.41 frames. ], batch size: 57, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:40:21,261 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.170e+05 2024-08-21 01:40:33,930 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 01:41:09,616 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 01:41:09,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5040950.0, ans=0.2 2024-08-21 01:41:18,598 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:41:40,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-21 01:41:42,026 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2024-08-21 01:41:51,469 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 350, loss[loss=0.08712, beats_loss=0.01122, ecapa_loss=0.0001509, whisper_loss=0.07439, over 18239.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01009, ecapa_loss=0.0001396, whisper_loss=0.08839, over 3153006.64 frames. ], batch size: 77, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:41:58,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5041250.0, ans=0.125 2024-08-21 01:41:58,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5041250.0, ans=0.1 2024-08-21 01:42:03,733 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 01:42:07,384 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-21 01:42:26,784 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-21 01:42:43,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.323e+01 2.496e+01 2.850e+01 5.461e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-21 01:43:05,101 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-21 01:43:17,208 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-21 01:43:17,884 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 01:43:19,307 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 400, loss[loss=0.1095, beats_loss=0.009846, ecapa_loss=0.0001365, whisper_loss=0.09828, over 15174.00 frames. ], tot_loss[loss=0.0996, beats_loss=0.0102, ecapa_loss=0.0001393, whisper_loss=0.088, over 3286839.19 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:43:19,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5041750.0, ans=0.5 2024-08-21 01:43:23,637 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-08-21 01:43:25,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5041750.0, ans=0.125 2024-08-21 01:43:32,929 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5041750.0, ans=0.125 2024-08-21 01:43:41,603 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5041850.0, ans=0.2 2024-08-21 01:43:54,203 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 22 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-21 01:44:22,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5042050.0, ans=0.125 2024-08-21 01:44:33,319 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-21 01:44:36,118 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 25 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 01:44:37,650 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 24 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-21 01:44:46,306 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-21 01:44:48,065 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-08-21 01:44:50,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 450, loss[loss=0.09917, beats_loss=0.01132, ecapa_loss=0.0001572, whisper_loss=0.08627, over 23001.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01011, ecapa_loss=0.0001407, whisper_loss=0.08914, over 3404936.81 frames. ], batch size: 94, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:44:52,018 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 16 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-21 01:44:56,998 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:45:12,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5042350.0, ans=0.2 2024-08-21 01:45:17,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5042350.0, ans=0.1 2024-08-21 01:45:43,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.267e+01 2.494e+01 2.807e+01 3.587e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-21 01:46:00,416 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 01:46:20,988 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 500, loss[loss=0.1063, beats_loss=0.009264, ecapa_loss=0.0001142, whisper_loss=0.09594, over 20214.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01012, ecapa_loss=0.000141, whisper_loss=0.08974, over 3437578.48 frames. ], batch size: 76, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:46:21,668 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5042750.0, ans=0.1 2024-08-21 01:47:05,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5042950.0, ans=0.0 2024-08-21 01:47:12,975 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:47:39,733 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-21 01:47:46,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5043150.0, ans=0.125 2024-08-21 01:47:46,075 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5043150.0, ans=0.0 2024-08-21 01:47:56,658 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 01:47:58,084 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 550, loss[loss=0.1046, beats_loss=0.009962, ecapa_loss=0.0001335, whisper_loss=0.09331, over 22849.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01013, ecapa_loss=0.0001414, whisper_loss=0.08915, over 3511636.04 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:48:06,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5043250.0, ans=0.125 2024-08-21 01:48:07,541 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 01:48:12,787 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 01:48:15,010 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 01:48:25,992 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 01:48:29,493 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 01:48:44,291 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 01:48:51,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5043450.0, ans=0.2 2024-08-21 01:48:54,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.277e+01 2.484e+01 2.909e+01 4.062e+02, threshold=4.967e+01, percent-clipped=2.0 2024-08-21 01:49:09,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5043550.0, ans=0.125 2024-08-21 01:49:30,498 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 600, loss[loss=0.07548, beats_loss=0.01351, ecapa_loss=8.461e-05, whisper_loss=0.06112, over 17060.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01014, ecapa_loss=0.0001408, whisper_loss=0.0889, over 3521984.25 frames. ], batch size: 66, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:49:45,352 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-21 01:50:03,811 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5043850.0, ans=0.2 2024-08-21 01:50:45,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=5044150.0, ans=6.0 2024-08-21 01:50:48,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5044150.0, ans=0.0 2024-08-21 01:51:00,064 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 650, loss[loss=0.1076, beats_loss=0.009766, ecapa_loss=0.0001364, whisper_loss=0.09645, over 16113.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.0001394, whisper_loss=0.08884, over 3589833.63 frames. ], batch size: 61, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:51:07,925 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5044250.0, ans=0.125 2024-08-21 01:51:11,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5044250.0, ans=0.0 2024-08-21 01:51:22,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5044350.0, ans=0.125 2024-08-21 01:51:24,524 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-08-21 01:51:29,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5044350.0, ans=0.125 2024-08-21 01:51:35,791 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:51:51,920 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.231e+01 2.431e+01 2.762e+01 3.963e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-21 01:51:53,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5044550.0, ans=0.1 2024-08-21 01:51:54,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5044550.0, ans=0.125 2024-08-21 01:51:54,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5044550.0, ans=0.1 2024-08-21 01:52:11,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5044650.0, ans=0.125 2024-08-21 01:52:13,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5044650.0, ans=0.2 2024-08-21 01:52:15,574 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 01:52:27,918 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 700, loss[loss=0.0812, beats_loss=0.01138, ecapa_loss=0.0001514, whisper_loss=0.06831, over 14404.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01026, ecapa_loss=0.0001389, whisper_loss=0.08886, over 3648541.91 frames. ], batch size: 59, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:52:39,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5044750.0, ans=0.0 2024-08-21 01:53:13,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-21 01:53:25,085 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5045050.0, ans=0.04949747468305833 2024-08-21 01:53:34,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=5045050.0, ans=0.95 2024-08-21 01:53:34,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5045050.0, ans=0.2 2024-08-21 01:53:48,815 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5045150.0, ans=10.0 2024-08-21 01:53:56,828 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 750, loss[loss=0.09332, beats_loss=0.01152, ecapa_loss=0.0001172, whisper_loss=0.08063, over 22785.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01026, ecapa_loss=0.0001382, whisper_loss=0.08895, over 3680785.95 frames. ], batch size: 90, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:53:57,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5045250.0, ans=0.125 2024-08-21 01:54:03,210 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.511e-01 2024-08-21 01:54:10,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-21 01:54:14,223 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-21 01:54:39,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5045450.0, ans=0.2 2024-08-21 01:54:49,506 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.245e+01 2.497e+01 2.753e+01 9.624e+01, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 01:55:00,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5045550.0, ans=0.125 2024-08-21 01:55:08,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5045650.0, ans=0.04949747468305833 2024-08-21 01:55:14,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5045650.0, ans=0.2 2024-08-21 01:55:25,996 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 800, loss[loss=0.1118, beats_loss=0.008686, ecapa_loss=0.0001666, whisper_loss=0.1014, over 20096.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01027, ecapa_loss=0.0001374, whisper_loss=0.08896, over 3726098.86 frames. ], batch size: 81, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:55:45,126 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5045850.0, ans=0.1 2024-08-21 01:55:50,022 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 01:55:58,420 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-21 01:56:05,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5045950.0, ans=0.125 2024-08-21 01:56:09,627 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-21 01:56:31,551 WARNING [optim.py:496] (2/4) Scaling gradients by 0.057700227946043015, model_norm_threshold=49.945823669433594 2024-08-21 01:56:31,720 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.361e+05, grad_sumsq=1.361e+05, orig_rms_sq=1.000e+00 2024-08-21 01:56:41,458 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5046150.0, ans=0.125 2024-08-21 01:56:48,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=5046150.0, ans=15.0 2024-08-21 01:56:49,058 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 01:56:53,866 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 850, loss[loss=0.1081, beats_loss=0.007948, ecapa_loss=0.000185, whisper_loss=0.0983, over 13355.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.000137, whisper_loss=0.08917, over 3726127.77 frames. ], batch size: 53, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:57:36,040 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-08-21 01:57:37,832 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2024-08-21 01:57:48,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.265e+01 2.514e+01 2.854e+01 8.656e+02, threshold=5.028e+01, percent-clipped=3.0 2024-08-21 01:57:52,777 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 01:58:07,244 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-21 01:58:09,598 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046650.0, ans=0.1 2024-08-21 01:58:11,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5046650.0, ans=0.125 2024-08-21 01:58:25,201 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 900, loss[loss=0.1003, beats_loss=0.009463, ecapa_loss=0.0001492, whisper_loss=0.08936, over 19648.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01029, ecapa_loss=0.0001373, whisper_loss=0.08827, over 3740620.77 frames. ], batch size: 78, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:58:35,315 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5046750.0, ans=0.1 2024-08-21 01:59:13,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5046950.0, ans=0.05 2024-08-21 01:59:22,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5047050.0, ans=0.0 2024-08-21 01:59:32,539 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 18 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-21 01:59:55,748 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 950, loss[loss=0.09152, beats_loss=0.01049, ecapa_loss=0.0001754, whisper_loss=0.07928, over 16838.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01025, ecapa_loss=0.0001373, whisper_loss=0.08849, over 3738980.19 frames. ], batch size: 71, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:00:02,922 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 02:00:08,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5047250.0, ans=0.125 2024-08-21 02:00:11,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5047350.0, ans=0.125 2024-08-21 02:00:34,254 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 02:00:38,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5047450.0, ans=0.1 2024-08-21 02:00:43,151 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 02:00:48,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.167e+01 2.360e+01 2.609e+01 1.184e+02, threshold=4.721e+01, percent-clipped=1.0 2024-08-21 02:00:49,146 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5047550.0, ans=0.1 2024-08-21 02:00:53,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5047550.0, ans=0.125 2024-08-21 02:00:54,800 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5047550.0, ans=0.07 2024-08-21 02:01:02,995 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5047550.0, ans=0.0 2024-08-21 02:01:17,398 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5047650.0, ans=0.125 2024-08-21 02:01:23,454 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1000, loss[loss=0.1152, beats_loss=0.00851, ecapa_loss=0.0001326, whisper_loss=0.1054, over 21426.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0103, ecapa_loss=0.0001368, whisper_loss=0.08857, over 3733918.14 frames. ], batch size: 82, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:01:30,406 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-08-21 02:01:32,981 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 02:01:41,401 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 02:02:00,293 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-21 02:02:02,685 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:02:41,998 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 27 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 02:02:51,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5048150.0, ans=0.1 2024-08-21 02:02:53,584 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1050, loss[loss=0.123, beats_loss=0.007894, ecapa_loss=0.0001198, whisper_loss=0.1139, over 18107.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01024, ecapa_loss=0.0001365, whisper_loss=0.08881, over 3743258.04 frames. ], batch size: 66, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:03:09,037 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 22 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-21 02:03:11,472 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-21 02:03:15,056 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5048350.0, ans=0.2 2024-08-21 02:03:18,418 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 02:03:30,106 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 33 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-21 02:03:39,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5048450.0, ans=0.015 2024-08-21 02:03:45,863 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5048450.0, ans=0.125 2024-08-21 02:03:47,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5048450.0, ans=0.1 2024-08-21 02:03:50,214 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.332e+01 2.561e+01 2.821e+01 8.058e+01, threshold=5.122e+01, percent-clipped=2.0 2024-08-21 02:03:58,493 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 02:04:01,220 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-21 02:04:12,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5048650.0, ans=0.125 2024-08-21 02:04:15,527 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 02:04:27,079 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1100, loss[loss=0.1003, beats_loss=0.00993, ecapa_loss=0.0001269, whisper_loss=0.08911, over 22927.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01024, ecapa_loss=0.000137, whisper_loss=0.08888, over 3764549.26 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:04:31,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5048750.0, ans=0.125 2024-08-21 02:05:02,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5048950.0, ans=0.2 2024-08-21 02:05:50,991 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5049150.0, ans=10.0 2024-08-21 02:05:58,279 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1150, loss[loss=0.115, beats_loss=0.01018, ecapa_loss=0.0001085, whisper_loss=0.1037, over 24085.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01024, ecapa_loss=0.0001362, whisper_loss=0.08884, over 3728420.21 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:05:59,189 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5049250.0, ans=0.0 2024-08-21 02:06:08,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=12.0 2024-08-21 02:06:38,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5049450.0, ans=0.125 2024-08-21 02:06:40,834 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 02:06:50,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.348e+01 2.579e+01 2.822e+01 4.118e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-21 02:07:25,362 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1200, loss[loss=0.08718, beats_loss=0.011, ecapa_loss=0.0001364, whisper_loss=0.07482, over 20465.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0102, ecapa_loss=0.0001359, whisper_loss=0.08928, over 3740374.57 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:07:43,787 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-21 02:07:56,089 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5049850.0, ans=0.125 2024-08-21 02:08:12,198 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 02:08:23,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5050050.0, ans=0.1 2024-08-21 02:08:27,025 INFO [train_multi_KD3.py:845] (2/4) A total of 97 cuts. 28 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-21 02:08:32,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5050050.0, ans=0.125 2024-08-21 02:08:52,681 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1250, loss[loss=0.1034, beats_loss=0.01163, ecapa_loss=0.0001279, whisper_loss=0.09045, over 14952.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01021, ecapa_loss=0.0001361, whisper_loss=0.08975, over 3756025.60 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:09:04,805 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2024-08-21 02:09:28,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-21 02:09:46,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.192e+01 2.364e+01 2.564e+01 4.097e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-21 02:09:56,560 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-21 02:10:05,540 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 02:10:07,364 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 02:10:07,997 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2024-08-21 02:10:09,229 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 02:10:14,389 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 02:10:23,329 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1300, loss[loss=0.1158, beats_loss=0.007278, ecapa_loss=0.0001347, whisper_loss=0.1072, over 15791.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01026, ecapa_loss=0.0001363, whisper_loss=0.08959, over 3764922.67 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:10:44,460 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 02:10:48,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5050850.0, ans=0.125 2024-08-21 02:10:50,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5050850.0, ans=0.0 2024-08-21 02:10:59,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5050950.0, ans=0.2 2024-08-21 02:11:00,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2024-08-21 02:11:07,837 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=15.0 2024-08-21 02:11:08,115 WARNING [optim.py:496] (2/4) Scaling gradients by 0.01577102579176426, model_norm_threshold=47.28926467895508 2024-08-21 02:11:08,283 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.852e+06, grad_sumsq=2.852e+06, orig_rms_sq=1.000e+00 2024-08-21 02:11:17,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5051050.0, ans=0.125 2024-08-21 02:11:18,483 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 02:11:24,801 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-08-21 02:11:27,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5051050.0, ans=0.125 2024-08-21 02:11:37,503 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-08-21 02:11:42,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5051150.0, ans=0.09899494936611666 2024-08-21 02:11:52,005 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1350, loss[loss=0.0876, beats_loss=0.01096, ecapa_loss=0.0001358, whisper_loss=0.07528, over 21353.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01032, ecapa_loss=0.0001363, whisper_loss=0.0894, over 3767741.81 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:12:05,866 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5051250.0, ans=0.0 2024-08-21 02:12:10,653 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 02:12:17,999 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5051350.0, ans=0.125 2024-08-21 02:12:47,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.226e+01 2.529e+01 2.867e+01 2.998e+03, threshold=5.057e+01, percent-clipped=1.0 2024-08-21 02:12:52,700 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 17 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-21 02:13:14,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-21 02:13:16,312 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=12.0 2024-08-21 02:13:20,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-21 02:13:23,047 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1400, loss[loss=0.07321, beats_loss=0.0109, ecapa_loss=0.0001094, whisper_loss=0.06121, over 13827.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01022, ecapa_loss=0.0001361, whisper_loss=0.09012, over 3765943.60 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:13:35,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-08-21 02:13:53,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5051850.0, ans=0.125 2024-08-21 02:13:55,642 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 02:13:58,271 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5051850.0, ans=0.0 2024-08-21 02:14:03,488 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-21 02:14:05,240 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 02:14:22,023 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 02:14:23,840 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 02:14:51,379 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-21 02:14:55,677 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1450, loss[loss=0.1158, beats_loss=0.01002, ecapa_loss=0.0001797, whisper_loss=0.104, over 19201.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.0001358, whisper_loss=0.08985, over 3782411.30 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:14:56,487 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 02:15:34,481 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-08-21 02:15:40,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5052450.0, ans=0.0 2024-08-21 02:15:49,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.283e+01 2.598e+01 2.871e+01 4.818e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-21 02:16:15,950 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 02:16:23,336 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 28 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 02:16:42,135 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1500, loss[loss=0.1148, beats_loss=0.01039, ecapa_loss=0.0001332, whisper_loss=0.1031, over 18864.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01024, ecapa_loss=0.0001362, whisper_loss=0.08923, over 3786337.51 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:16:46,962 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5052750.0, ans=0.125 2024-08-21 02:17:02,248 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 02:17:19,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5052850.0, ans=0.5 2024-08-21 02:17:41,419 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 02:18:01,750 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-21 02:18:02,038 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5053150.0, ans=0.1 2024-08-21 02:18:16,708 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1550, loss[loss=0.06815, beats_loss=0.01296, ecapa_loss=0.0001435, whisper_loss=0.05375, over 12350.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0102, ecapa_loss=0.0001365, whisper_loss=0.08879, over 3794492.07 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:18:49,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5053350.0, ans=0.125 2024-08-21 02:18:55,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5053450.0, ans=0.0 2024-08-21 02:18:58,292 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-21 02:19:00,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5053450.0, ans=0.0 2024-08-21 02:19:02,243 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-21 02:19:04,376 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5053450.0, ans=0.125 2024-08-21 02:19:10,006 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-21 02:19:13,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.172e+01 2.388e+01 2.761e+01 1.037e+02, threshold=4.777e+01, percent-clipped=1.0 2024-08-21 02:19:17,280 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 02:19:27,292 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-21 02:19:32,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5053650.0, ans=0.1 2024-08-21 02:19:34,661 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5053650.0, ans=0.125 2024-08-21 02:19:41,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5053650.0, ans=0.2 2024-08-21 02:19:47,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5053650.0, ans=0.1 2024-08-21 02:19:50,434 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1600, loss[loss=0.1113, beats_loss=0.009987, ecapa_loss=0.000121, whisper_loss=0.1001, over 23132.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01026, ecapa_loss=0.0001355, whisper_loss=0.08912, over 3818021.12 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:19:51,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5053750.0, ans=0.2 2024-08-21 02:19:52,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5053750.0, ans=0.125 2024-08-21 02:20:00,205 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 28 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-21 02:20:05,954 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-21 02:20:15,864 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5053850.0, ans=0.0 2024-08-21 02:20:31,492 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.75 vs. limit=22.5 2024-08-21 02:20:32,648 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 02:20:36,519 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 02:20:43,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5054050.0, ans=0.0 2024-08-21 02:20:49,340 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 02:20:57,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5054050.0, ans=0.125 2024-08-21 02:21:02,839 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 02:21:20,729 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1650, loss[loss=0.08441, beats_loss=0.01061, ecapa_loss=0.0001341, whisper_loss=0.07246, over 16612.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01029, ecapa_loss=0.0001356, whisper_loss=0.08918, over 3819826.74 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:21:26,367 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-21 02:21:28,075 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 02:21:30,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5054250.0, ans=0.125 2024-08-21 02:21:46,795 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 02:22:14,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.289e+01 2.512e+01 2.829e+01 4.001e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 02:22:23,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5054550.0, ans=0.125 2024-08-21 02:22:24,103 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-21 02:22:25,228 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2024-08-21 02:22:35,994 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 02:22:45,379 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5054650.0, ans=0.125 2024-08-21 02:22:51,682 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1700, loss[loss=0.09993, beats_loss=0.008414, ecapa_loss=0.0001643, whisper_loss=0.08987, over 17006.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01027, ecapa_loss=0.0001355, whisper_loss=0.08935, over 3836584.77 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:22:52,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5054750.0, ans=0.125 2024-08-21 02:22:54,053 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 02:22:54,113 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5054750.0, ans=0.125 2024-08-21 02:22:56,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5054750.0, ans=0.1 2024-08-21 02:23:09,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5054850.0, ans=0.125 2024-08-21 02:23:27,649 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5054950.0, ans=0.0 2024-08-21 02:23:29,190 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-21 02:24:15,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5055150.0, ans=0.125 2024-08-21 02:24:21,297 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 02:24:23,802 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1750, loss[loss=0.1151, beats_loss=0.009411, ecapa_loss=0.0001487, whisper_loss=0.1042, over 16391.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0102, ecapa_loss=0.0001359, whisper_loss=0.08933, over 3803915.01 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:24:48,635 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 02:24:59,373 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-21 02:25:10,352 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 02:25:18,174 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5055550.0, ans=0.0 2024-08-21 02:25:19,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.242e+01 2.435e+01 2.822e+01 2.727e+02, threshold=4.871e+01, percent-clipped=1.0 2024-08-21 02:25:25,483 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2024-08-21 02:25:27,045 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5055550.0, ans=0.0 2024-08-21 02:25:34,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5055550.0, ans=0.2 2024-08-21 02:25:55,140 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1800, loss[loss=0.09184, beats_loss=0.01258, ecapa_loss=0.0001016, whisper_loss=0.07825, over 24221.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01029, ecapa_loss=0.0001344, whisper_loss=0.08901, over 3827785.60 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:26:17,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=5055850.0, ans=0.2 2024-08-21 02:26:22,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5055850.0, ans=0.125 2024-08-21 02:26:27,671 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 22 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-21 02:26:31,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5055950.0, ans=0.025 2024-08-21 02:26:41,186 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-21 02:27:26,621 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1850, loss[loss=0.09726, beats_loss=0.01144, ecapa_loss=0.0001416, whisper_loss=0.0844, over 22523.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001336, whisper_loss=0.08867, over 3816990.45 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:27:29,028 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5056250.0, ans=0.125 2024-08-21 02:27:32,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5056250.0, ans=0.0 2024-08-21 02:27:41,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5056250.0, ans=0.0 2024-08-21 02:27:43,212 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5056350.0, ans=0.0 2024-08-21 02:27:46,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5056350.0, ans=0.0 2024-08-21 02:27:53,812 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5056350.0, ans=0.0 2024-08-21 02:28:01,706 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 02:28:21,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.279e+01 2.495e+01 2.833e+01 4.581e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-21 02:28:22,058 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5056550.0, ans=0.2 2024-08-21 02:28:27,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5056550.0, ans=0.125 2024-08-21 02:28:31,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5056550.0, ans=0.0 2024-08-21 02:28:35,724 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5056550.0, ans=0.2 2024-08-21 02:28:45,290 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 02:28:54,966 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 02:28:57,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5056750.0, ans=0.1 2024-08-21 02:28:58,297 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1900, loss[loss=0.1158, beats_loss=0.009951, ecapa_loss=0.0001329, whisper_loss=0.1045, over 22331.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01038, ecapa_loss=0.0001329, whisper_loss=0.08863, over 3838898.07 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:29:39,866 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 02:30:04,553 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 02:30:05,535 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-08-21 02:30:16,351 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:30:29,922 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 1950, loss[loss=0.1089, beats_loss=0.01137, ecapa_loss=0.000137, whisper_loss=0.09617, over 21944.00 frames. ], tot_loss[loss=0.09972, beats_loss=0.0104, ecapa_loss=0.0001328, whisper_loss=0.08798, over 3806110.19 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:30:41,932 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5057250.0, ans=0.125 2024-08-21 02:30:50,336 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 29 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 02:30:53,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5057350.0, ans=0.1 2024-08-21 02:31:02,744 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5057350.0, ans=0.125 2024-08-21 02:31:14,040 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5057450.0, ans=0.125 2024-08-21 02:31:22,244 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 02:31:25,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.215e+01 2.451e+01 2.695e+01 5.295e+01, threshold=4.901e+01, percent-clipped=1.0 2024-08-21 02:31:28,260 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5057550.0, ans=10.0 2024-08-21 02:31:38,151 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-21 02:31:38,360 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2024-08-21 02:32:02,173 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2000, loss[loss=0.1056, beats_loss=0.01054, ecapa_loss=0.0001263, whisper_loss=0.09382, over 19678.00 frames. ], tot_loss[loss=0.09982, beats_loss=0.01044, ecapa_loss=0.000132, whisper_loss=0.08806, over 3766030.93 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:32:03,116 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5057750.0, ans=0.125 2024-08-21 02:32:16,896 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-21 02:32:27,004 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5057850.0, ans=0.1 2024-08-21 02:32:40,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5057950.0, ans=0.0 2024-08-21 02:32:45,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5057950.0, ans=0.1 2024-08-21 02:33:07,539 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.65 vs. limit=10.0 2024-08-21 02:33:10,772 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5058050.0, ans=0.0 2024-08-21 02:33:12,304 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 12 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 02:33:18,889 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-21 02:33:19,334 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.181e+01 2024-08-21 02:33:25,625 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 02:33:31,005 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 02:33:34,309 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2050, loss[loss=0.08837, beats_loss=0.01307, ecapa_loss=0.0001153, whisper_loss=0.07415, over 17816.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01038, ecapa_loss=0.0001314, whisper_loss=0.08858, over 3759540.32 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:33:52,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5058350.0, ans=0.1 2024-08-21 02:34:08,669 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 02:34:24,544 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 31 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 02:34:30,316 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.290e+01 2.506e+01 2.810e+01 1.281e+02, threshold=5.013e+01, percent-clipped=3.0 2024-08-21 02:34:43,838 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5058550.0, ans=0.125 2024-08-21 02:34:55,274 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5058650.0, ans=0.05 2024-08-21 02:35:04,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=22.5 2024-08-21 02:35:06,270 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2100, loss[loss=0.08569, beats_loss=0.01171, ecapa_loss=0.0001594, whisper_loss=0.07238, over 16033.00 frames. ], tot_loss[loss=0.09992, beats_loss=0.01043, ecapa_loss=0.0001319, whisper_loss=0.08817, over 3764011.50 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:35:06,869 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 02:35:30,912 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5058850.0, ans=0.0 2024-08-21 02:35:58,207 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 02:36:01,779 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 02:36:09,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5059050.0, ans=0.125 2024-08-21 02:36:12,190 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-21 02:36:23,810 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5059150.0, ans=0.0 2024-08-21 02:36:26,702 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 18 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 02:36:37,080 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2150, loss[loss=0.09055, beats_loss=0.01192, ecapa_loss=0.0001417, whisper_loss=0.07721, over 21334.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.0001327, whisper_loss=0.08934, over 3756946.89 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:36:52,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5059250.0, ans=10.0 2024-08-21 02:37:05,557 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 02:37:22,165 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5059450.0, ans=0.0 2024-08-21 02:37:34,770 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.50 vs. limit=10.0 2024-08-21 02:37:35,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.212e+01 2.472e+01 2.786e+01 4.629e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 02:38:10,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-21 02:38:13,160 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2200, loss[loss=0.09976, beats_loss=0.01209, ecapa_loss=0.0001475, whisper_loss=0.0862, over 18181.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.0001323, whisper_loss=0.08932, over 3754895.74 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:38:16,192 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.02 vs. limit=10.0 2024-08-21 02:38:24,057 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-21 02:38:32,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5059850.0, ans=0.0 2024-08-21 02:38:34,600 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5059850.0, ans=0.0 2024-08-21 02:38:50,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-21 02:38:57,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5059950.0, ans=0.0 2024-08-21 02:39:01,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5059950.0, ans=0.125 2024-08-21 02:39:19,744 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 33 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 02:39:43,573 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 02:39:45,174 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2250, loss[loss=0.1141, beats_loss=0.01077, ecapa_loss=0.0001267, whisper_loss=0.1021, over 21621.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01029, ecapa_loss=0.0001336, whisper_loss=0.08888, over 3725916.58 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:39:47,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5060250.0, ans=0.1 2024-08-21 02:40:09,135 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 17 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-21 02:40:25,025 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 8 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 02:40:30,300 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 02:40:39,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.265e+01 2.538e+01 2.956e+01 4.238e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 02:40:47,048 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5060550.0, ans=0.125 2024-08-21 02:40:52,107 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 21 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-21 02:41:00,925 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 02:41:08,158 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 02:41:14,829 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2300, loss[loss=0.1182, beats_loss=0.009073, ecapa_loss=0.0001618, whisper_loss=0.1075, over 22386.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.0001342, whisper_loss=0.08952, over 3730174.49 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:41:16,818 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 13 from LS+wenet, 26 from Vox, 13 fro AS 2024-08-21 02:41:17,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5060750.0, ans=0.0 2024-08-21 02:41:32,622 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 02:41:40,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5060850.0, ans=0.2 2024-08-21 02:41:41,270 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 02:41:45,091 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 02:41:53,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5060950.0, ans=0.0 2024-08-21 02:41:59,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5060950.0, ans=0.2 2024-08-21 02:42:10,322 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.54 vs. limit=22.5 2024-08-21 02:42:23,665 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 18 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 02:42:41,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2024-08-21 02:42:48,620 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2350, loss[loss=0.1098, beats_loss=0.00935, ecapa_loss=0.0001411, whisper_loss=0.09903, over 18812.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01033, ecapa_loss=0.0001352, whisper_loss=0.08911, over 3755936.85 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:42:59,261 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-21 02:43:46,345 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5061450.0, ans=0.125 2024-08-21 02:43:50,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.279e+01 2.504e+01 2.806e+01 9.902e+01, threshold=5.007e+01, percent-clipped=2.0 2024-08-21 02:43:50,858 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 02:43:53,072 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5061550.0, ans=0.125 2024-08-21 02:43:55,519 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5061550.0, ans=0.025 2024-08-21 02:44:31,002 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2400, loss[loss=0.0947, beats_loss=0.01119, ecapa_loss=0.0001194, whisper_loss=0.08233, over 23131.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001357, whisper_loss=0.08926, over 3776290.96 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:44:36,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5061750.0, ans=0.125 2024-08-21 02:44:45,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5061750.0, ans=0.1 2024-08-21 02:44:52,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5061850.0, ans=0.125 2024-08-21 02:44:55,494 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2024-08-21 02:45:10,992 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.48 vs. limit=10.0 2024-08-21 02:45:30,635 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 02:45:43,544 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 02:45:45,318 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5062050.0, ans=0.125 2024-08-21 02:45:59,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5062150.0, ans=0.125 2024-08-21 02:46:13,269 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.10 vs. limit=10.0 2024-08-21 02:46:21,411 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 20 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-21 02:46:23,533 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2450, loss[loss=0.07908, beats_loss=0.01345, ecapa_loss=0.0001195, whisper_loss=0.06443, over 21195.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001353, whisper_loss=0.08964, over 3796311.30 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:47:13,106 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5062450.0, ans=0.0 2024-08-21 02:47:32,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-21 02:47:34,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.316e+01 2.526e+01 2.772e+01 3.117e+02, threshold=5.053e+01, percent-clipped=1.0 2024-08-21 02:47:51,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5062550.0, ans=0.125 2024-08-21 02:48:22,922 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 02:48:24,286 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2500, loss[loss=0.1014, beats_loss=0.01118, ecapa_loss=0.0001258, whisper_loss=0.089, over 21530.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001356, whisper_loss=0.08952, over 3781647.75 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:48:36,840 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 02:49:29,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5062950.0, ans=0.0 2024-08-21 02:49:34,726 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 02:49:40,888 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 25 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 02:49:46,062 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5063050.0, ans=0.0 2024-08-21 02:50:13,085 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2550, loss[loss=0.09881, beats_loss=0.01039, ecapa_loss=0.0001082, whisper_loss=0.08734, over 16130.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0103, ecapa_loss=0.0001358, whisper_loss=0.08954, over 3783881.61 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:50:16,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5063250.0, ans=0.125 2024-08-21 02:50:39,486 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-21 02:50:45,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5063350.0, ans=0.0 2024-08-21 02:51:20,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.281e+01 2.497e+01 2.831e+01 4.835e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-21 02:52:03,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5063650.0, ans=0.125 2024-08-21 02:52:09,136 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2600, loss[loss=0.09874, beats_loss=0.01115, ecapa_loss=0.0001196, whisper_loss=0.0864, over 19211.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0103, ecapa_loss=0.0001351, whisper_loss=0.08978, over 3832902.50 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:52:25,772 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 02:52:29,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5063750.0, ans=0.0 2024-08-21 02:52:35,562 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5063850.0, ans=0.1 2024-08-21 02:52:48,460 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5063850.0, ans=0.0 2024-08-21 02:52:54,716 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 02:53:04,801 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 9 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 02:53:07,856 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-21 02:53:28,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5064050.0, ans=0.125 2024-08-21 02:53:35,465 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 02:54:06,494 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-21 02:54:14,176 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 02:54:21,845 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2650, loss[loss=0.107, beats_loss=0.007233, ecapa_loss=0.0001336, whisper_loss=0.09839, over 20453.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01028, ecapa_loss=0.0001368, whisper_loss=0.08963, over 3837058.05 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:54:40,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5064250.0, ans=0.125 2024-08-21 02:54:43,653 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-21 02:54:48,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5064350.0, ans=0.125 2024-08-21 02:55:00,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5064350.0, ans=0.125 2024-08-21 02:55:08,955 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 37 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-21 02:55:28,336 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 02:55:41,675 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.306e+01 2.544e+01 2.939e+01 3.967e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-21 02:55:59,544 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 21 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 02:56:00,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5064550.0, ans=0.125 2024-08-21 02:56:03,709 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5064550.0, ans=0.1 2024-08-21 02:56:05,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=12.0 2024-08-21 02:56:24,285 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 02:56:32,250 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2700, loss[loss=0.07806, beats_loss=0.01105, ecapa_loss=0.0001454, whisper_loss=0.06556, over 13602.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001374, whisper_loss=0.08948, over 3838491.12 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:56:39,030 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5064750.0, ans=0.0 2024-08-21 02:56:52,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5064750.0, ans=0.125 2024-08-21 02:57:37,159 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 13 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 02:57:44,987 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-21 02:57:46,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5064950.0, ans=10.0 2024-08-21 02:57:51,771 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-21 02:58:03,114 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 02:58:42,138 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2750, loss[loss=0.0915, beats_loss=0.01171, ecapa_loss=0.0001525, whisper_loss=0.07826, over 21189.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.08902, over 3831945.89 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:58:46,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.38 vs. limit=22.5 2024-08-21 02:59:21,103 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-21 02:59:59,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.389e+01 2.549e+01 2.769e+01 6.929e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-21 03:00:16,620 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5065550.0, ans=0.125 2024-08-21 03:00:32,927 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 32 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-21 03:00:39,002 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-21 03:00:45,313 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5065650.0, ans=0.0 2024-08-21 03:00:48,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2800, loss[loss=0.1003, beats_loss=0.01013, ecapa_loss=0.0001386, whisper_loss=0.08883, over 16748.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01034, ecapa_loss=0.0001369, whisper_loss=0.0891, over 3839471.70 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:00:48,928 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 33 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 03:01:04,466 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 03:01:06,562 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 18 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 03:01:08,142 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5065750.0, ans=0.125 2024-08-21 03:01:26,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-21 03:02:08,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5066050.0, ans=0.0 2024-08-21 03:02:28,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-21 03:02:56,663 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2850, loss[loss=0.11, beats_loss=0.01146, ecapa_loss=0.0001358, whisper_loss=0.09723, over 23043.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001371, whisper_loss=0.08895, over 3804288.54 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:02:58,845 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2024-08-21 03:03:35,467 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=5066350.0, ans=0.02 2024-08-21 03:03:39,086 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 03:03:40,589 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5066350.0, ans=0.125 2024-08-21 03:04:01,082 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5066450.0, ans=0.0 2024-08-21 03:04:09,495 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 34 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 03:04:14,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.291e+01 2.516e+01 2.868e+01 4.695e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 03:04:39,891 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5066650.0, ans=0.0 2024-08-21 03:04:48,646 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5066650.0, ans=10.0 2024-08-21 03:04:50,897 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5066650.0, ans=0.0 2024-08-21 03:05:07,354 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2900, loss[loss=0.1072, beats_loss=0.013, ecapa_loss=0.0001053, whisper_loss=0.09319, over 18669.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001375, whisper_loss=0.08929, over 3820393.04 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:05:21,392 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 03:05:43,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5066850.0, ans=0.0 2024-08-21 03:05:45,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-08-21 03:05:59,034 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 03:05:59,985 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=22.5 2024-08-21 03:06:03,696 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5066950.0, ans=0.125 2024-08-21 03:06:07,638 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2024-08-21 03:06:11,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5066950.0, ans=0.0 2024-08-21 03:06:24,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5067050.0, ans=0.125 2024-08-21 03:06:27,912 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 23 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-21 03:06:32,328 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:06:45,803 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5067150.0, ans=0.95 2024-08-21 03:07:10,185 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 2950, loss[loss=0.08342, beats_loss=0.01053, ecapa_loss=0.0001237, whisper_loss=0.07165, over 22162.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.08864, over 3832719.47 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:07:13,263 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-21 03:07:42,176 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=5067350.0, ans=0.05 2024-08-21 03:07:51,935 WARNING [optim.py:496] (2/4) Scaling gradients by 0.03678512200713158, model_norm_threshold=50.32452392578125 2024-08-21 03:07:52,103 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.620e+05, grad_sumsq=7.962e+04, orig_rms_sq=3.290e+00 2024-08-21 03:08:19,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.280e+01 2.501e+01 2.875e+01 1.368e+03, threshold=5.003e+01, percent-clipped=1.0 2024-08-21 03:08:24,018 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 03:08:35,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5067550.0, ans=0.125 2024-08-21 03:08:40,406 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5067650.0, ans=0.2 2024-08-21 03:08:43,575 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 03:08:50,938 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5067650.0, ans=0.125 2024-08-21 03:09:02,341 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3000, loss[loss=0.09458, beats_loss=0.01103, ecapa_loss=0.0001111, whisper_loss=0.08244, over 14570.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001379, whisper_loss=0.08927, over 3804198.64 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:09:02,342 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-21 03:09:39,336 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2496, over 931116.00 frames. 2024-08-21 03:10:01,784 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on SV_voxceleb1: loss=0.003899, beats_loss=0, ecapa_loss=0.0003899, whisper_loss=0, over 944235.00 frames. 2024-08-21 03:11:41,966 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 03:11:41,970 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-21 03:11:48,041 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-21 03:12:05,899 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 03:12:20,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5067950.0, ans=0.0 2024-08-21 03:12:25,301 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 03:12:26,594 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5067950.0, ans=0.1 2024-08-21 03:12:40,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5068050.0, ans=0.0 2024-08-21 03:12:50,516 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 03:13:12,664 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3050, loss[loss=0.09723, beats_loss=0.009852, ecapa_loss=0.0001305, whisper_loss=0.08608, over 16807.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.08964, over 3838549.06 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:13:42,897 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2024-08-21 03:14:02,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5068450.0, ans=0.1 2024-08-21 03:14:08,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.251e+01 2.540e+01 2.788e+01 3.733e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-21 03:14:23,094 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-21 03:14:29,229 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:14:30,259 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 03:14:30,647 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5068650.0, ans=0.1 2024-08-21 03:14:44,375 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3100, loss[loss=0.09717, beats_loss=0.01308, ecapa_loss=0.0001344, whisper_loss=0.08275, over 22557.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001392, whisper_loss=0.09027, over 3849616.30 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:14:48,013 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 03:15:16,479 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5068850.0, ans=0.125 2024-08-21 03:15:51,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5069050.0, ans=0.125 2024-08-21 03:16:06,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-21 03:16:17,551 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3150, loss[loss=0.09137, beats_loss=0.01004, ecapa_loss=0.0001439, whisper_loss=0.07989, over 18972.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.00014, whisper_loss=0.09038, over 3830217.42 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:16:28,840 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5069250.0, ans=0.125 2024-08-21 03:16:38,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5069350.0, ans=0.0 2024-08-21 03:16:50,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5069350.0, ans=0.0 2024-08-21 03:16:56,320 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5069450.0, ans=0.0 2024-08-21 03:17:12,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.426e+01 2.655e+01 2.939e+01 1.391e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-21 03:17:13,429 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5069550.0, ans=0.1 2024-08-21 03:17:27,580 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5069550.0, ans=0.125 2024-08-21 03:17:30,685 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 03:17:32,333 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-21 03:17:38,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5069650.0, ans=0.0 2024-08-21 03:17:44,997 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 03:17:48,329 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3200, loss[loss=0.101, beats_loss=0.009988, ecapa_loss=0.0001546, whisper_loss=0.0895, over 20230.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.08962, over 3818305.94 frames. ], batch size: 83, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:17:50,997 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5069750.0, ans=0.1 2024-08-21 03:17:56,744 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-21 03:18:05,821 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-08-21 03:18:14,243 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5069850.0, ans=0.125 2024-08-21 03:18:22,800 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 03:18:47,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5070050.0, ans=0.2 2024-08-21 03:19:04,720 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5070150.0, ans=0.125 2024-08-21 03:19:07,052 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5070150.0, ans=0.0 2024-08-21 03:19:09,152 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5070150.0, ans=0.1 2024-08-21 03:19:19,349 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3250, loss[loss=0.0852, beats_loss=0.01392, ecapa_loss=0.0001085, whisper_loss=0.07019, over 19711.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.08995, over 3789897.72 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:19:31,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5070250.0, ans=0.125 2024-08-21 03:19:59,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=12.0 2024-08-21 03:20:19,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5070450.0, ans=0.0 2024-08-21 03:20:24,121 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.311e+01 2.569e+01 2.814e+01 1.085e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-21 03:21:06,347 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3300, loss[loss=0.1079, beats_loss=0.01028, ecapa_loss=0.0001099, whisper_loss=0.09652, over 18843.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001377, whisper_loss=0.09066, over 3799941.65 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:21:11,185 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2024-08-21 03:21:18,935 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 03:21:22,218 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5070750.0, ans=0.0 2024-08-21 03:21:31,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5070850.0, ans=0.2 2024-08-21 03:21:39,183 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 03:21:43,444 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 03:21:47,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5070850.0, ans=0.0 2024-08-21 03:21:55,496 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=12.0 2024-08-21 03:22:47,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5071150.0, ans=0.0 2024-08-21 03:22:58,852 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3350, loss[loss=0.09099, beats_loss=0.01121, ecapa_loss=0.0001369, whisper_loss=0.0784, over 14448.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01028, ecapa_loss=0.0001391, whisper_loss=0.09109, over 3790678.81 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:23:12,442 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 03:23:20,640 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.53 vs. limit=10.0 2024-08-21 03:23:39,250 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 19 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-21 03:23:40,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5071350.0, ans=0.125 2024-08-21 03:23:43,766 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-08-21 03:23:56,977 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-21 03:24:09,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.266e+01 2.448e+01 2.718e+01 4.054e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-21 03:24:56,846 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3400, loss[loss=0.115, beats_loss=0.009235, ecapa_loss=0.0001474, whisper_loss=0.1043, over 22130.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001394, whisper_loss=0.09055, over 3751201.53 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:25:00,951 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5071750.0, ans=0.0 2024-08-21 03:25:03,164 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5071750.0, ans=0.035 2024-08-21 03:25:04,316 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 20 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 03:25:15,590 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2024-08-21 03:25:22,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5071850.0, ans=0.125 2024-08-21 03:25:31,367 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 03:26:19,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5072050.0, ans=0.125 2024-08-21 03:26:21,390 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5072050.0, ans=0.125 2024-08-21 03:26:35,741 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-21 03:26:43,885 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5072150.0, ans=0.125 2024-08-21 03:26:57,264 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3450, loss[loss=0.08767, beats_loss=0.01039, ecapa_loss=0.0001173, whisper_loss=0.07611, over 13825.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01027, ecapa_loss=0.0001395, whisper_loss=0.09085, over 3743548.34 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:27:04,035 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5072250.0, ans=0.1 2024-08-21 03:27:04,429 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-21 03:27:09,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5072250.0, ans=0.1 2024-08-21 03:27:47,358 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-21 03:27:53,612 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5072450.0, ans=0.1 2024-08-21 03:28:09,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.313e+01 2.518e+01 2.811e+01 5.199e+01, threshold=5.037e+01, percent-clipped=1.0 2024-08-21 03:28:12,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5072550.0, ans=0.125 2024-08-21 03:28:45,822 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 13 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 03:28:51,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5072750.0, ans=0.125 2024-08-21 03:28:52,415 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3500, loss[loss=0.0879, beats_loss=0.01133, ecapa_loss=0.0001636, whisper_loss=0.07494, over 13279.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001389, whisper_loss=0.09052, over 3750359.20 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:28:54,913 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 03:29:17,908 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5072850.0, ans=0.0 2024-08-21 03:29:22,977 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 03:29:46,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5072950.0, ans=0.5 2024-08-21 03:29:51,552 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5072950.0, ans=0.1 2024-08-21 03:30:11,936 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 03:30:24,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5073150.0, ans=0.125 2024-08-21 03:30:38,114 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5073150.0, ans=0.0 2024-08-21 03:30:44,519 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3550, loss[loss=0.08464, beats_loss=0.01296, ecapa_loss=0.0001406, whisper_loss=0.07028, over 14047.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.09016, over 3777224.07 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:31:00,629 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5073250.0, ans=0.125 2024-08-21 03:31:10,361 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-08-21 03:31:24,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=5073350.0, ans=0.2 2024-08-21 03:31:46,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5073450.0, ans=0.0 2024-08-21 03:31:52,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.266e+01 2.535e+01 2.803e+01 1.045e+02, threshold=5.070e+01, percent-clipped=1.0 2024-08-21 03:32:07,669 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 03:32:28,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5073650.0, ans=0.125 2024-08-21 03:32:28,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5073650.0, ans=0.2 2024-08-21 03:32:38,820 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3600, loss[loss=0.1014, beats_loss=0.01148, ecapa_loss=9.934e-05, whisper_loss=0.08893, over 22795.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.000139, whisper_loss=0.09059, over 3779403.51 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:32:54,780 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-21 03:33:03,336 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 9 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 03:33:38,631 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 03:33:50,500 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-21 03:34:07,703 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2024-08-21 03:34:29,625 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5074150.0, ans=0.125 2024-08-21 03:34:35,478 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3650, loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0001558, whisper_loss=0.09144, over 21171.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01032, ecapa_loss=0.0001382, whisper_loss=0.09025, over 3799852.31 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:34:45,466 INFO [train_multi_KD3.py:845] (2/4) A total of 51 cuts. 11 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 03:35:16,310 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2024-08-21 03:35:38,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5074450.0, ans=0.1 2024-08-21 03:35:48,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.270e+01 2.492e+01 2.659e+01 4.040e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 03:36:11,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2024-08-21 03:36:34,372 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3700, loss[loss=0.1039, beats_loss=0.008302, ecapa_loss=0.0001407, whisper_loss=0.09423, over 14079.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001378, whisper_loss=0.09008, over 3793897.21 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:36:56,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5074750.0, ans=0.2 2024-08-21 03:37:34,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5074950.0, ans=0.1 2024-08-21 03:37:48,742 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-21 03:38:04,445 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 03:38:34,077 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3750, loss[loss=0.1101, beats_loss=0.008552, ecapa_loss=0.0001183, whisper_loss=0.1004, over 13282.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001373, whisper_loss=0.09007, over 3758302.43 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:38:48,031 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5075250.0, ans=0.1 2024-08-21 03:38:48,490 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2024-08-21 03:39:05,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5075350.0, ans=0.1 2024-08-21 03:39:08,441 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 03:39:50,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.197e+01 2.452e+01 2.774e+01 3.553e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-21 03:39:54,119 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-21 03:39:55,014 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5075550.0, ans=0.125 2024-08-21 03:39:59,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5075550.0, ans=0.125 2024-08-21 03:40:35,822 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3800, loss[loss=0.1029, beats_loss=0.01187, ecapa_loss=0.0001048, whisper_loss=0.08995, over 17859.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001375, whisper_loss=0.09075, over 3782458.63 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:41:15,781 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 03:41:30,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5075950.0, ans=0.2 2024-08-21 03:42:14,166 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 03:42:17,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5076150.0, ans=0.1 2024-08-21 03:42:38,126 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3850, loss[loss=0.1041, beats_loss=0.009718, ecapa_loss=0.0001333, whisper_loss=0.09301, over 22915.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001381, whisper_loss=0.09045, over 3790889.60 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:42:54,430 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5076250.0, ans=0.1 2024-08-21 03:42:59,962 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-21 03:43:10,782 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 03:43:27,241 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5076450.0, ans=0.125 2024-08-21 03:43:34,515 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5076450.0, ans=0.0 2024-08-21 03:43:37,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-21 03:43:50,946 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.237e+01 2.453e+01 2.696e+01 3.570e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-21 03:43:59,400 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 18 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 03:44:17,574 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5076650.0, ans=0.0 2024-08-21 03:44:31,616 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5076650.0, ans=0.2 2024-08-21 03:44:38,211 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3900, loss[loss=0.08064, beats_loss=0.01231, ecapa_loss=0.0001199, whisper_loss=0.06714, over 18799.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001372, whisper_loss=0.08994, over 3788177.10 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:44:43,024 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 34 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 03:44:45,023 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-21 03:44:57,865 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.634e-01 2024-08-21 03:45:15,564 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-21 03:45:26,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5076950.0, ans=0.035 2024-08-21 03:45:48,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5077050.0, ans=0.0 2024-08-21 03:46:03,623 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=22.5 2024-08-21 03:46:21,399 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 03:46:26,023 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 03:46:39,107 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 3950, loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001697, whisper_loss=0.09324, over 20097.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001373, whisper_loss=0.09075, over 3814146.95 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:46:50,343 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5077250.0, ans=0.1 2024-08-21 03:46:59,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5077250.0, ans=0.2 2024-08-21 03:47:06,051 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-21 03:47:29,191 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5077450.0, ans=0.125 2024-08-21 03:47:51,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.265e+01 2.547e+01 2.985e+01 6.857e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-21 03:47:55,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2024-08-21 03:48:33,824 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 03:48:39,327 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4000, loss[loss=0.09223, beats_loss=0.0102, ecapa_loss=0.0001754, whisper_loss=0.08028, over 15901.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01022, ecapa_loss=0.0001375, whisper_loss=0.09192, over 3816586.21 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:48:43,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5077750.0, ans=0.05 2024-08-21 03:48:45,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5077750.0, ans=0.5 2024-08-21 03:49:02,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5077850.0, ans=0.0 2024-08-21 03:49:21,302 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 37 from Vox, 29 fro AS 2024-08-21 03:49:22,540 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5077850.0, ans=0.1 2024-08-21 03:49:48,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5077950.0, ans=0.0 2024-08-21 03:49:48,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5077950.0, ans=0.2 2024-08-21 03:49:51,270 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-21 03:50:39,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5078150.0, ans=0.125 2024-08-21 03:50:39,796 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5078150.0, ans=0.5 2024-08-21 03:50:48,817 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4050, loss[loss=0.1133, beats_loss=0.009378, ecapa_loss=0.0001705, whisper_loss=0.1022, over 22141.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01021, ecapa_loss=0.0001388, whisper_loss=0.09184, over 3840474.92 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:51:48,722 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-21 03:51:49,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5078450.0, ans=0.125 2024-08-21 03:52:04,365 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5078450.0, ans=0.0 2024-08-21 03:52:10,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.320e+01 2.567e+01 2.893e+01 7.952e+01, threshold=5.134e+01, percent-clipped=3.0 2024-08-21 03:52:26,300 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=12.0 2024-08-21 03:52:41,456 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-21 03:52:50,393 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5078650.0, ans=0.125 2024-08-21 03:53:01,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4100, loss[loss=0.08867, beats_loss=0.01121, ecapa_loss=0.0001523, whisper_loss=0.07593, over 13737.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.000139, whisper_loss=0.09119, over 3840626.22 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:53:15,630 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.67 vs. limit=10.0 2024-08-21 03:53:16,243 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0443761944770813, model_norm_threshold=51.335693359375 2024-08-21 03:53:16,411 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.724e+05, grad_sumsq=2.524e+07, orig_rms_sq=1.079e-02 2024-08-21 03:53:20,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5078750.0, ans=0.2 2024-08-21 03:53:39,620 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 03:54:15,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5078950.0, ans=0.1 2024-08-21 03:54:46,778 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5079150.0, ans=0.1 2024-08-21 03:54:47,861 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 03:55:07,316 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:55:10,838 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4150, loss[loss=0.1096, beats_loss=0.00949, ecapa_loss=0.0001493, whisper_loss=0.09864, over 19188.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01027, ecapa_loss=0.0001392, whisper_loss=0.09123, over 3855569.04 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:55:12,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5079250.0, ans=0.125 2024-08-21 03:55:20,018 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5079250.0, ans=0.0 2024-08-21 03:55:33,451 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5079250.0, ans=0.125 2024-08-21 03:55:49,901 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-21 03:56:07,641 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 03:56:10,510 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 03:56:32,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.340e+01 2.592e+01 2.879e+01 1.157e+03, threshold=5.184e+01, percent-clipped=4.0 2024-08-21 03:56:41,168 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-21 03:56:45,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5079550.0, ans=0.125 2024-08-21 03:57:09,764 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5079650.0, ans=0.125 2024-08-21 03:57:11,020 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 03:57:17,870 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4200, loss[loss=0.09008, beats_loss=0.009853, ecapa_loss=0.00011, whisper_loss=0.07912, over 14886.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001391, whisper_loss=0.09116, over 3840668.89 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:57:18,134 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 03:57:18,419 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5079750.0, ans=0.125 2024-08-21 03:57:43,469 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 27 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 03:57:50,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5079850.0, ans=0.0 2024-08-21 03:58:14,135 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5079950.0, ans=0.2 2024-08-21 03:58:31,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2024-08-21 03:58:39,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5080050.0, ans=0.0 2024-08-21 03:58:46,697 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5080050.0, ans=0.035 2024-08-21 03:58:53,653 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 03:58:55,460 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-21 03:58:56,120 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-21 03:59:12,222 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-21 03:59:20,461 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4250, loss[loss=0.0882, beats_loss=0.009548, ecapa_loss=0.0001579, whisper_loss=0.07707, over 21073.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09093, over 3842753.91 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:59:20,709 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-21 03:59:25,157 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 39 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-21 03:59:46,674 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-21 03:59:49,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5080350.0, ans=0.125 2024-08-21 04:00:13,804 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5080450.0, ans=0.125 2024-08-21 04:00:40,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.302e+01 2.518e+01 2.832e+01 1.053e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-21 04:00:42,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5080550.0, ans=0.1 2024-08-21 04:01:29,538 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4300, loss[loss=0.1014, beats_loss=0.01121, ecapa_loss=0.0001432, whisper_loss=0.08875, over 18429.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.09076, over 3826717.12 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:01:43,360 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5080750.0, ans=0.1 2024-08-21 04:02:14,382 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-21 04:02:53,227 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5081050.0, ans=0.0 2024-08-21 04:03:03,736 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 19 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 04:03:21,578 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 04:03:30,185 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4350, loss[loss=0.09494, beats_loss=0.0113, ecapa_loss=0.0001284, whisper_loss=0.08236, over 21646.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.08997, over 3837946.72 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:03:34,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5081250.0, ans=0.125 2024-08-21 04:03:41,488 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5081250.0, ans=0.125 2024-08-21 04:04:17,942 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-21 04:04:25,787 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 10 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 04:04:32,195 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-21 04:04:37,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.205e+01 2.430e+01 2.775e+01 4.634e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 04:04:37,209 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 21 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-21 04:04:38,522 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5081550.0, ans=0.0 2024-08-21 04:04:50,655 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5081550.0, ans=0.0 2024-08-21 04:05:03,157 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-21 04:05:06,338 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-21 04:05:09,439 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2024-08-21 04:05:12,346 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 04:05:19,894 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4400, loss[loss=0.09692, beats_loss=0.01143, ecapa_loss=0.0001329, whisper_loss=0.08416, over 21408.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001391, whisper_loss=0.08954, over 3840579.89 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:05:45,145 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2024-08-21 04:05:56,725 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 25 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-21 04:06:01,793 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.68 vs. limit=10.0 2024-08-21 04:06:07,711 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 15 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 04:06:14,706 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-21 04:06:20,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5081950.0, ans=0.125 2024-08-21 04:07:04,096 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5082050.0, ans=0.2 2024-08-21 04:07:09,751 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 04:07:31,852 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4450, loss[loss=0.09904, beats_loss=0.0122, ecapa_loss=0.0001321, whisper_loss=0.08552, over 17863.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001387, whisper_loss=0.08939, over 3818343.78 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:07:41,198 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5082250.0, ans=0.2 2024-08-21 04:08:14,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5082350.0, ans=0.125 2024-08-21 04:08:21,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5082450.0, ans=0.025 2024-08-21 04:08:28,847 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 04:08:51,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.213e+01 2.422e+01 2.731e+01 3.413e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-21 04:09:01,262 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5082550.0, ans=0.1 2024-08-21 04:09:06,042 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5082550.0, ans=0.1 2024-08-21 04:09:42,527 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4500, loss[loss=0.09418, beats_loss=0.01035, ecapa_loss=0.0001792, whisper_loss=0.08204, over 19721.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.00014, whisper_loss=0.08979, over 3814889.55 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:09:56,532 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 04:10:26,433 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5082850.0, ans=10.0 2024-08-21 04:10:36,072 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:10:36,092 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:10:37,118 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-21 04:10:45,851 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2024-08-21 04:11:53,289 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4550, loss[loss=0.1029, beats_loss=0.008661, ecapa_loss=0.000151, whisper_loss=0.09277, over 18734.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.08959, over 3818105.96 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:12:14,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5083250.0, ans=0.5 2024-08-21 04:12:20,844 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 04:12:31,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5083350.0, ans=0.0 2024-08-21 04:12:39,649 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.91 vs. limit=10.0 2024-08-21 04:12:53,779 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-21 04:12:59,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5083450.0, ans=0.125 2024-08-21 04:13:09,702 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 04:13:13,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.342e+01 2.629e+01 2.950e+01 5.025e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-21 04:13:35,176 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-08-21 04:13:40,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5083650.0, ans=0.125 2024-08-21 04:13:40,681 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-21 04:13:43,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-08-21 04:13:47,005 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-21 04:13:54,694 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 04:14:04,397 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4600, loss[loss=0.1241, beats_loss=0.008498, ecapa_loss=0.0001533, whisper_loss=0.1141, over 21671.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001394, whisper_loss=0.08952, over 3794822.01 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:14:04,662 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 04:14:22,464 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5083750.0, ans=0.0 2024-08-21 04:15:32,901 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 04:15:37,289 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5084050.0, ans=0.125 2024-08-21 04:15:42,927 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5084150.0, ans=0.125 2024-08-21 04:15:48,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5084150.0, ans=0.0 2024-08-21 04:15:58,756 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5084150.0, ans=0.0 2024-08-21 04:16:04,146 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:16:07,704 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4650, loss[loss=0.0897, beats_loss=0.01195, ecapa_loss=0.0001548, whisper_loss=0.0762, over 16260.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01038, ecapa_loss=0.0001395, whisper_loss=0.0888, over 3757536.38 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:16:22,241 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-21 04:17:09,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5084450.0, ans=0.1 2024-08-21 04:17:20,314 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 04:17:27,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.290e+01 2.552e+01 2.874e+01 1.481e+02, threshold=5.104e+01, percent-clipped=2.0 2024-08-21 04:18:07,936 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-21 04:18:10,286 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5084650.0, ans=0.125 2024-08-21 04:18:14,919 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4700, loss[loss=0.09252, beats_loss=0.01183, ecapa_loss=0.000129, whisper_loss=0.0794, over 15760.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08885, over 3729967.60 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:18:35,086 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=12.0 2024-08-21 04:18:39,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5084850.0, ans=0.125 2024-08-21 04:19:11,977 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.527e+00 2024-08-21 04:19:15,431 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 04:19:26,611 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-21 04:19:41,691 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-21 04:19:43,555 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5085050.0, ans=0.125 2024-08-21 04:19:43,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5085050.0, ans=0.125 2024-08-21 04:19:44,771 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 04:20:00,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.30 vs. limit=10.0 2024-08-21 04:20:25,053 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4750, loss[loss=0.1141, beats_loss=0.00796, ecapa_loss=0.0001812, whisper_loss=0.1043, over 19087.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01038, ecapa_loss=0.0001407, whisper_loss=0.08913, over 3750460.07 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:20:33,312 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 04:21:01,819 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 39 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 04:21:02,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5085350.0, ans=0.125 2024-08-21 04:21:12,096 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 04:21:36,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-21 04:21:44,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.249e+01 2.433e+01 2.723e+01 6.483e+01, threshold=4.865e+01, percent-clipped=1.0 2024-08-21 04:22:03,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5085550.0, ans=0.0 2024-08-21 04:22:06,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.89 vs. limit=22.5 2024-08-21 04:22:33,058 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4800, loss[loss=0.1092, beats_loss=0.01098, ecapa_loss=0.0001637, whisper_loss=0.09663, over 17848.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.000139, whisper_loss=0.08944, over 3774128.87 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:22:40,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5085750.0, ans=0.0 2024-08-21 04:23:11,593 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2024-08-21 04:23:26,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2024-08-21 04:23:33,578 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 04:23:57,802 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 04:24:22,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5086150.0, ans=0.125 2024-08-21 04:24:23,937 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 16 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 04:24:39,289 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4850, loss[loss=0.08961, beats_loss=0.01429, ecapa_loss=8.621e-05, whisper_loss=0.07446, over 14983.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.08947, over 3757199.72 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:24:46,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5086250.0, ans=0.0 2024-08-21 04:24:54,224 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5086250.0, ans=0.09899494936611666 2024-08-21 04:24:56,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-08-21 04:25:07,621 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 04:25:12,151 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 04:25:41,190 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-21 04:25:44,386 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5086450.0, ans=0.1 2024-08-21 04:25:51,403 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 19 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-21 04:25:56,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.255e+01 2.433e+01 2.647e+01 4.364e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-21 04:26:10,873 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5086550.0, ans=0.125 2024-08-21 04:26:38,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5086650.0, ans=0.125 2024-08-21 04:26:42,415 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4900, loss[loss=0.08197, beats_loss=0.01083, ecapa_loss=0.0001542, whisper_loss=0.0696, over 16557.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08879, over 3787885.24 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:26:54,378 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5086750.0, ans=0.125 2024-08-21 04:26:54,492 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5086750.0, ans=0.125 2024-08-21 04:26:56,571 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5086750.0, ans=0.2 2024-08-21 04:27:09,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5086850.0, ans=0.125 2024-08-21 04:27:18,969 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 04:27:34,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-21 04:28:52,729 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 4950, loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001368, whisper_loss=0.08873, over 18111.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01049, ecapa_loss=0.00014, whisper_loss=0.08859, over 3814266.47 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:28:57,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-21 04:29:07,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5087250.0, ans=0.2 2024-08-21 04:29:30,931 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-21 04:29:45,155 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 04:29:47,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5087450.0, ans=0.2 2024-08-21 04:30:04,591 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 13 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 04:30:06,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5087450.0, ans=0.0 2024-08-21 04:30:14,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.296e+01 2.472e+01 2.859e+01 4.220e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 04:30:31,818 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 35 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 04:30:40,688 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.966e+00 2024-08-21 04:30:59,837 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 04:31:04,952 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5000, loss[loss=0.1027, beats_loss=0.01189, ecapa_loss=0.000143, whisper_loss=0.08935, over 21377.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08924, over 3778103.33 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:31:15,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5087750.0, ans=0.1 2024-08-21 04:31:19,024 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 04:31:43,105 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2024-08-21 04:32:11,467 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-21 04:32:12,652 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5087950.0, ans=0.125 2024-08-21 04:33:08,081 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5050, loss[loss=0.09932, beats_loss=0.009678, ecapa_loss=0.0001559, whisper_loss=0.08808, over 20817.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001404, whisper_loss=0.08949, over 3785827.79 frames. ], batch size: 84, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:33:14,986 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5088250.0, ans=0.125 2024-08-21 04:33:16,785 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5088250.0, ans=0.125 2024-08-21 04:33:43,953 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 04:33:59,636 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5088450.0, ans=0.0 2024-08-21 04:34:24,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.200e+01 2.422e+01 2.716e+01 3.329e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-21 04:35:00,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5088650.0, ans=0.0 2024-08-21 04:35:10,906 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5100, loss[loss=0.1022, beats_loss=0.009431, ecapa_loss=0.0001549, whisper_loss=0.09122, over 21770.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.08923, over 3768483.37 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:35:18,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5088750.0, ans=0.0 2024-08-21 04:35:38,737 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2024-08-21 04:35:44,651 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 04:36:13,034 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-21 04:36:29,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5089050.0, ans=0.125 2024-08-21 04:36:31,950 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 28 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-21 04:36:36,401 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-08-21 04:36:49,978 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 04:36:59,465 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5089150.0, ans=0.07 2024-08-21 04:37:05,334 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5150, loss[loss=0.1078, beats_loss=0.009435, ecapa_loss=0.0001314, whisper_loss=0.09704, over 22835.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08895, over 3757053.06 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:37:10,059 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 14 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-21 04:37:10,890 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5089250.0, ans=0.125 2024-08-21 04:37:29,597 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5089350.0, ans=0.1 2024-08-21 04:37:34,859 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 04:37:41,216 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-21 04:37:41,793 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 04:38:04,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5089450.0, ans=0.125 2024-08-21 04:38:12,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=15.0 2024-08-21 04:38:12,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.286e+01 2.550e+01 3.060e+01 1.523e+02, threshold=5.101e+01, percent-clipped=5.0 2024-08-21 04:38:17,386 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 11 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 04:38:22,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5089550.0, ans=0.125 2024-08-21 04:38:39,788 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-08-21 04:38:56,238 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5200, loss[loss=0.1301, beats_loss=0.007968, ecapa_loss=0.0001519, whisper_loss=0.1206, over 22546.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.08971, over 3749338.97 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:38:57,617 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2024-08-21 04:39:03,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5089750.0, ans=0.0 2024-08-21 04:39:08,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5089750.0, ans=0.125 2024-08-21 04:39:18,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5089850.0, ans=0.125 2024-08-21 04:40:47,202 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5250, loss[loss=0.09369, beats_loss=0.01202, ecapa_loss=0.000123, whisper_loss=0.08044, over 19737.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.000141, whisper_loss=0.08984, over 3765800.12 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:41:04,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5090250.0, ans=0.0 2024-08-21 04:41:04,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5090250.0, ans=0.125 2024-08-21 04:41:12,586 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 04:41:18,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5090350.0, ans=0.125 2024-08-21 04:41:34,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5090450.0, ans=0.2 2024-08-21 04:41:58,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.269e+01 2.486e+01 2.907e+01 3.986e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-21 04:41:58,322 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 04:41:59,645 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5090550.0, ans=0.125 2024-08-21 04:42:00,628 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5090550.0, ans=0.125 2024-08-21 04:42:14,036 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5090550.0, ans=0.5 2024-08-21 04:42:24,310 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 04:42:25,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5090650.0, ans=0.125 2024-08-21 04:42:42,842 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5300, loss[loss=0.09759, beats_loss=0.00739, ecapa_loss=0.0001987, whisper_loss=0.08821, over 14755.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01027, ecapa_loss=0.0001398, whisper_loss=0.0901, over 3790172.30 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:42:44,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=15.0 2024-08-21 04:42:47,308 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 04:43:32,351 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5090950.0, ans=0.05 2024-08-21 04:43:46,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5090950.0, ans=0.1 2024-08-21 04:43:53,103 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 04:43:54,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5091050.0, ans=0.5 2024-08-21 04:44:01,771 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5091050.0, ans=0.1 2024-08-21 04:44:04,884 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 04:44:09,698 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 04:44:22,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-08-21 04:44:27,627 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 04:44:42,687 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5350, loss[loss=0.1148, beats_loss=0.01041, ecapa_loss=0.0001422, whisper_loss=0.103, over 22291.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001385, whisper_loss=0.09003, over 3738546.65 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:44:47,062 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 04:44:53,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-21 04:44:56,454 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5091250.0, ans=0.125 2024-08-21 04:44:56,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5091250.0, ans=0.125 2024-08-21 04:46:00,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.175e+01 2.396e+01 2.648e+01 3.168e+01, threshold=4.792e+01, percent-clipped=0.0 2024-08-21 04:46:00,252 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 36 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 04:46:20,444 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 04:46:22,543 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 04:46:33,765 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 04:46:40,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5091650.0, ans=0.2 2024-08-21 04:46:48,421 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5400, loss[loss=0.09735, beats_loss=0.01061, ecapa_loss=0.0001498, whisper_loss=0.08524, over 20926.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.0001389, whisper_loss=0.0898, over 3758499.74 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:46:58,275 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 27 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 04:47:30,845 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 04:48:57,116 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5450, loss[loss=0.09288, beats_loss=0.01085, ecapa_loss=0.0001245, whisper_loss=0.08079, over 18745.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001395, whisper_loss=0.08936, over 3749030.99 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:49:01,214 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5092250.0, ans=0.125 2024-08-21 04:49:07,993 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-21 04:49:29,303 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 04:49:31,635 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5092350.0, ans=0.0 2024-08-21 04:49:36,893 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-08-21 04:50:00,308 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=12.0 2024-08-21 04:50:18,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.233e+01 2.525e+01 2.938e+01 2.405e+02, threshold=5.050e+01, percent-clipped=4.0 2024-08-21 04:50:46,601 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 04:51:08,507 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5092750.0, ans=0.125 2024-08-21 04:51:09,334 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5500, loss[loss=0.08893, beats_loss=0.01225, ecapa_loss=0.0001403, whisper_loss=0.07527, over 22611.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.08898, over 3755864.65 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:51:18,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5092750.0, ans=0.2 2024-08-21 04:51:21,971 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5092750.0, ans=0.125 2024-08-21 04:51:35,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5092850.0, ans=0.0 2024-08-21 04:51:41,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5092850.0, ans=0.0 2024-08-21 04:51:57,993 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-21 04:52:40,713 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5093050.0, ans=0.1 2024-08-21 04:52:41,778 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 04:53:06,967 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-21 04:53:20,893 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5550, loss[loss=0.1064, beats_loss=0.01182, ecapa_loss=0.0001575, whisper_loss=0.09303, over 22746.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001389, whisper_loss=0.08962, over 3823671.49 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:54:11,295 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 04:54:18,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5093450.0, ans=0.125 2024-08-21 04:54:48,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.248e+01 2.482e+01 2.824e+01 3.933e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-21 04:55:00,507 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 04:55:04,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5093550.0, ans=0.125 2024-08-21 04:55:06,638 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5093650.0, ans=0.125 2024-08-21 04:55:10,494 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 04:55:17,646 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 04:55:24,285 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-21 04:55:27,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5093650.0, ans=0.125 2024-08-21 04:55:30,821 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5093650.0, ans=0.0 2024-08-21 04:55:33,850 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5600, loss[loss=0.1279, beats_loss=0.009892, ecapa_loss=0.0001157, whisper_loss=0.1169, over 23949.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.000139, whisper_loss=0.08981, over 3819162.56 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:55:57,251 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5093750.0, ans=0.04949747468305833 2024-08-21 04:56:18,138 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5093850.0, ans=0.125 2024-08-21 04:56:20,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5093850.0, ans=0.07 2024-08-21 04:56:23,131 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5093850.0, ans=0.2 2024-08-21 04:56:53,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2024-08-21 04:57:08,715 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 04:57:35,947 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5650, loss[loss=0.07773, beats_loss=0.0115, ecapa_loss=0.0001625, whisper_loss=0.0646, over 20647.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001399, whisper_loss=0.08975, over 3865201.00 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:57:41,441 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5094250.0, ans=0.0 2024-08-21 04:58:02,257 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 28 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-21 04:58:23,712 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5094450.0, ans=0.95 2024-08-21 04:58:31,295 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 12 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 04:58:49,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.209e+01 2.456e+01 2.705e+01 6.075e+01, threshold=4.911e+01, percent-clipped=1.0 2024-08-21 04:58:57,946 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 04:59:02,338 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094550.0, ans=0.1 2024-08-21 04:59:03,354 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 04:59:34,814 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5700, loss[loss=0.1148, beats_loss=0.007955, ecapa_loss=0.0001749, whisper_loss=0.1051, over 19741.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001399, whisper_loss=0.09016, over 3844225.95 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:59:41,368 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 04:59:44,835 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5094750.0, ans=0.125 2024-08-21 04:59:57,583 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094850.0, ans=0.1 2024-08-21 05:00:33,048 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 05:00:37,910 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 32 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-21 05:01:19,105 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 05:01:25,337 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5750, loss[loss=0.09337, beats_loss=0.009904, ecapa_loss=0.000171, whisper_loss=0.08175, over 21106.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001401, whisper_loss=0.09008, over 3863131.84 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:01:55,981 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-21 05:02:03,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5095350.0, ans=0.125 2024-08-21 05:02:18,419 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2024-08-21 05:02:38,237 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5095550.0, ans=0.125 2024-08-21 05:02:38,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.497e+01 2.736e+01 4.299e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-21 05:02:48,575 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=12.0 2024-08-21 05:02:55,384 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 05:02:58,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=5095650.0, ans=0.95 2024-08-21 05:03:10,613 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5095650.0, ans=0.0 2024-08-21 05:03:20,582 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5800, loss[loss=0.1167, beats_loss=0.008002, ecapa_loss=0.0001299, whisper_loss=0.1074, over 13692.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001399, whisper_loss=0.09018, over 3836139.81 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:03:23,447 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 05:04:03,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-21 05:04:05,431 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-21 05:04:30,628 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 05:04:44,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5096050.0, ans=0.1 2024-08-21 05:05:03,886 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 05:05:10,728 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-21 05:05:11,383 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5850, loss[loss=0.09616, beats_loss=0.01233, ecapa_loss=0.0001727, whisper_loss=0.08211, over 21316.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.09052, over 3845895.45 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:05:47,619 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5096350.0, ans=0.1 2024-08-21 05:05:52,900 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-21 05:06:14,219 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-21 05:06:14,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.298e+01 2.570e+01 2.824e+01 3.912e+01, threshold=5.140e+01, percent-clipped=0.0 2024-08-21 05:06:50,866 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5900, loss[loss=0.07433, beats_loss=0.01233, ecapa_loss=0.0001155, whisper_loss=0.06084, over 16217.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.08994, over 3842720.58 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:06:56,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5096750.0, ans=0.125 2024-08-21 05:07:07,595 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5096750.0, ans=0.125 2024-08-21 05:07:11,423 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=12.0 2024-08-21 05:07:12,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:22,302 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-21 05:07:31,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5096950.0, ans=0.0 2024-08-21 05:07:32,959 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 05:07:36,397 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5096950.0, ans=0.05 2024-08-21 05:07:48,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5096950.0, ans=0.1 2024-08-21 05:08:20,220 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5097150.0, ans=0.125 2024-08-21 05:08:22,259 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5097150.0, ans=0.125 2024-08-21 05:08:29,548 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 05:08:31,359 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-21 05:08:35,846 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 5950, loss[loss=0.1058, beats_loss=0.00839, ecapa_loss=0.0001298, whisper_loss=0.09611, over 19294.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.08949, over 3824991.54 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:08:53,740 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5097250.0, ans=0.125 2024-08-21 05:09:30,692 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-21 05:09:35,618 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.56 vs. limit=6.0 2024-08-21 05:09:43,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.327e+01 2.629e+01 2.901e+01 4.645e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-21 05:09:49,314 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5097550.0, ans=0.0 2024-08-21 05:09:55,527 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5097550.0, ans=0.0 2024-08-21 05:09:56,580 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 05:10:02,832 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5097650.0, ans=0.125 2024-08-21 05:10:14,675 INFO [train_multi_KD3.py:845] (2/4) A total of 49 cuts. 15 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-21 05:10:16,080 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6000, loss[loss=0.09422, beats_loss=0.008538, ecapa_loss=0.0001062, whisper_loss=0.08462, over 12987.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.0895, over 3805236.46 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:10:16,081 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-21 05:10:53,947 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005022, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 05:11:19,538 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on SV_voxceleb1: loss=0.003907, beats_loss=0, ecapa_loss=0.0003907, whisper_loss=0, over 944235.00 frames. 2024-08-21 05:11:58,455 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7007, 1.7544, 1.7567, 1.6464], device='cuda:2') 2024-08-21 05:13:03,024 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 05:13:03,033 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-21 05:13:17,453 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-21 05:13:49,628 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=12.0 2024-08-21 05:13:55,408 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5097950.0, ans=0.125 2024-08-21 05:14:00,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5098050.0, ans=0.09899494936611666 2024-08-21 05:14:03,356 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-08-21 05:14:17,653 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5098150.0, ans=0.125 2024-08-21 05:14:34,871 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-21 05:14:37,231 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6050, loss[loss=0.08328, beats_loss=0.01291, ecapa_loss=0.0001344, whisper_loss=0.06902, over 17745.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.08916, over 3814614.25 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:14:38,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5098250.0, ans=0.1 2024-08-21 05:15:18,770 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 28 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-21 05:15:23,099 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 05:15:23,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5098450.0, ans=0.125 2024-08-21 05:15:24,783 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 05:15:29,365 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 37 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 05:15:37,793 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 05:15:39,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.180e+01 2.501e+01 2.659e+01 4.695e+01, threshold=5.002e+01, percent-clipped=0.0 2024-08-21 05:15:54,240 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5098650.0, ans=0.125 2024-08-21 05:15:58,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5098650.0, ans=0.0 2024-08-21 05:16:12,976 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6100, loss[loss=0.1134, beats_loss=0.01024, ecapa_loss=0.0001472, whisper_loss=0.1017, over 16428.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.08912, over 3807003.24 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:16:17,973 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5098750.0, ans=0.2 2024-08-21 05:16:24,665 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2024-08-21 05:16:25,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5098750.0, ans=0.0 2024-08-21 05:16:30,717 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5098850.0, ans=0.125 2024-08-21 05:16:51,816 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-21 05:17:01,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5098950.0, ans=0.0 2024-08-21 05:17:26,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5099050.0, ans=0.0 2024-08-21 05:17:40,774 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 05:17:52,000 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6150, loss[loss=0.0774, beats_loss=0.01145, ecapa_loss=0.0001105, whisper_loss=0.06485, over 13780.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08924, over 3816263.00 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:17:55,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5099250.0, ans=0.125 2024-08-21 05:17:57,518 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-21 05:18:01,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5099250.0, ans=0.1 2024-08-21 05:18:23,823 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5099350.0, ans=0.125 2024-08-21 05:18:23,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5099350.0, ans=0.0 2024-08-21 05:18:26,468 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 05:18:34,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=5099450.0, ans=0.02 2024-08-21 05:18:41,110 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 05:18:50,114 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 05:18:50,805 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5099550.0, ans=0.125 2024-08-21 05:18:53,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.269e+01 2.480e+01 2.871e+01 4.819e+02, threshold=4.960e+01, percent-clipped=2.0 2024-08-21 05:18:56,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5099550.0, ans=0.125 2024-08-21 05:18:57,104 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 14 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-21 05:19:06,741 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5099650.0, ans=0.125 2024-08-21 05:19:11,477 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-21 05:19:17,288 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 05:19:26,653 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6200, loss[loss=0.08266, beats_loss=0.01236, ecapa_loss=0.0001424, whisper_loss=0.06887, over 16188.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.0001388, whisper_loss=0.08889, over 3832694.10 frames. ], batch size: 68, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:20:10,294 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 05:20:13,103 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-21 05:20:24,456 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 05:20:31,523 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:20:37,177 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 29 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-21 05:21:06,134 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 18 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-21 05:21:08,197 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6250, loss[loss=0.1046, beats_loss=0.009853, ecapa_loss=9.876e-05, whisper_loss=0.09374, over 13401.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001389, whisper_loss=0.08937, over 3841945.39 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:21:11,023 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5100250.0, ans=0.0 2024-08-21 05:21:27,761 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 05:21:47,452 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 05:21:50,562 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-21 05:21:58,756 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 05:22:01,075 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 05:22:12,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.226e+01 2.520e+01 2.834e+01 9.847e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 05:22:21,881 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-08-21 05:22:30,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5100650.0, ans=0.1 2024-08-21 05:22:44,834 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-21 05:22:49,577 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6300, loss[loss=0.1069, beats_loss=0.009717, ecapa_loss=0.0001543, whisper_loss=0.09564, over 17885.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.08988, over 3873519.01 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:23:01,739 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.087e+00 2024-08-21 05:23:19,675 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 05:23:23,293 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 05:23:38,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5100950.0, ans=0.125 2024-08-21 05:23:47,418 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5101050.0, ans=0.2 2024-08-21 05:24:08,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5101150.0, ans=0.0 2024-08-21 05:24:08,956 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=12.0 2024-08-21 05:24:13,262 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 05:24:20,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5101150.0, ans=0.125 2024-08-21 05:24:24,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5101150.0, ans=0.1 2024-08-21 05:24:26,801 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6350, loss[loss=0.08441, beats_loss=0.01117, ecapa_loss=0.0001745, whisper_loss=0.0715, over 18774.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.08975, over 3878030.98 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:24:41,442 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 21 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 05:24:48,905 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 05:24:53,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5101350.0, ans=0.125 2024-08-21 05:24:57,656 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-08-21 05:25:19,449 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5101450.0, ans=0.0 2024-08-21 05:25:19,509 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5101450.0, ans=0.2 2024-08-21 05:25:30,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.267e+01 2.496e+01 2.800e+01 3.336e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-21 05:25:30,393 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 05:25:31,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5101550.0, ans=0.0 2024-08-21 05:25:46,024 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5101650.0, ans=0.125 2024-08-21 05:25:57,122 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5101650.0, ans=0.0 2024-08-21 05:26:04,616 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6400, loss[loss=0.09417, beats_loss=0.009382, ecapa_loss=0.0001464, whisper_loss=0.08332, over 19585.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.00014, whisper_loss=0.08993, over 3860449.01 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:26:05,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5101750.0, ans=0.0 2024-08-21 05:26:13,260 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0562300942838192, model_norm_threshold=49.92792510986328 2024-08-21 05:26:13,429 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.1.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.997e+04, grad_sumsq=6.997e+04, orig_rms_sq=1.000e+00 2024-08-21 05:26:14,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5101750.0, ans=0.125 2024-08-21 05:26:16,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5101750.0, ans=0.0 2024-08-21 05:26:37,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5101850.0, ans=0.125 2024-08-21 05:26:54,191 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-21 05:26:57,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5101950.0, ans=0.04949747468305833 2024-08-21 05:26:57,233 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5101950.0, ans=0.125 2024-08-21 05:27:06,706 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5102050.0, ans=0.1 2024-08-21 05:27:08,713 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:27:14,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5102050.0, ans=0.1 2024-08-21 05:27:21,230 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-21 05:27:24,664 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5102150.0, ans=0.0 2024-08-21 05:27:37,082 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6450, loss[loss=0.1078, beats_loss=0.00975, ecapa_loss=0.0001119, whisper_loss=0.09693, over 15583.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.0001403, whisper_loss=0.09079, over 3805083.95 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:27:39,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5102250.0, ans=0.125 2024-08-21 05:27:55,164 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 05:28:04,635 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 05:28:05,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5102350.0, ans=0.0 2024-08-21 05:28:08,907 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 16 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-21 05:28:09,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=5102350.0, ans=0.025 2024-08-21 05:28:30,379 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-21 05:28:34,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5102550.0, ans=0.125 2024-08-21 05:28:34,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.209e+01 2.498e+01 2.911e+01 8.879e+02, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 05:28:37,525 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 05:28:49,179 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5102650.0, ans=0.125 2024-08-21 05:29:07,903 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6500, loss[loss=0.08051, beats_loss=0.01294, ecapa_loss=0.0001585, whisper_loss=0.06598, over 21074.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001401, whisper_loss=0.09042, over 3819484.95 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:29:08,304 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 05:29:19,944 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 05:29:27,512 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 05:29:57,387 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-21 05:30:05,234 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5102950.0, ans=0.125 2024-08-21 05:30:10,814 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 05:30:11,586 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5103050.0, ans=0.0 2024-08-21 05:30:23,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5103050.0, ans=0.125 2024-08-21 05:30:36,350 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2024-08-21 05:30:39,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5103150.0, ans=0.0 2024-08-21 05:30:43,213 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5103150.0, ans=0.0 2024-08-21 05:30:50,205 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6550, loss[loss=0.1174, beats_loss=0.009304, ecapa_loss=0.0001328, whisper_loss=0.1068, over 22409.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.0001409, whisper_loss=0.09086, over 3860729.35 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:30:54,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5103250.0, ans=0.125 2024-08-21 05:31:02,125 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5103250.0, ans=0.0 2024-08-21 05:31:30,336 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 34 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 05:31:32,260 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 13 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 05:31:49,039 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 21 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-21 05:31:56,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5103550.0, ans=0.2 2024-08-21 05:31:59,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.280e+01 2.467e+01 2.785e+01 3.437e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-21 05:32:34,104 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6600, loss[loss=0.1174, beats_loss=0.01103, ecapa_loss=0.0001449, whisper_loss=0.1049, over 21248.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001414, whisper_loss=0.0912, over 3866615.14 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:32:35,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5103750.0, ans=0.125 2024-08-21 05:32:40,256 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-08-21 05:33:01,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5103850.0, ans=0.05 2024-08-21 05:33:05,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5103850.0, ans=0.0 2024-08-21 05:33:05,750 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5103850.0, ans=0.5 2024-08-21 05:33:09,354 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 05:33:22,446 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2024-08-21 05:33:50,009 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5104050.0, ans=0.125 2024-08-21 05:33:59,454 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-21 05:34:13,042 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6650, loss[loss=0.08474, beats_loss=0.01342, ecapa_loss=0.0001395, whisper_loss=0.06992, over 19015.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0102, ecapa_loss=0.000141, whisper_loss=0.09186, over 3882072.45 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:34:13,281 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 05:34:43,064 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5104350.0, ans=0.125 2024-08-21 05:34:50,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5104350.0, ans=0.125 2024-08-21 05:34:51,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5104450.0, ans=0.1 2024-08-21 05:34:55,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5104450.0, ans=0.125 2024-08-21 05:35:15,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.320e+01 2.541e+01 2.816e+01 4.422e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-21 05:35:48,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5104650.0, ans=0.07 2024-08-21 05:35:50,868 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2024-08-21 05:35:51,148 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6700, loss[loss=0.1039, beats_loss=0.01043, ecapa_loss=0.0001731, whisper_loss=0.09172, over 22637.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001411, whisper_loss=0.09088, over 3876686.49 frames. ], batch size: 95, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:36:26,151 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5104850.0, ans=0.125 2024-08-21 05:36:34,770 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 26 from LS+wenet, 36 from Vox, 31 fro AS 2024-08-21 05:36:37,197 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=12.0 2024-08-21 05:36:46,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5104950.0, ans=0.1 2024-08-21 05:36:57,808 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5105050.0, ans=0.2 2024-08-21 05:37:27,727 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6750, loss[loss=0.1042, beats_loss=0.009867, ecapa_loss=0.0001689, whisper_loss=0.09269, over 12161.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01026, ecapa_loss=0.0001424, whisper_loss=0.09112, over 3845452.96 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:37:30,418 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 05:37:40,155 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5105250.0, ans=0.0 2024-08-21 05:37:41,135 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 05:37:47,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5105350.0, ans=0.0 2024-08-21 05:38:12,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5105450.0, ans=0.1 2024-08-21 05:38:20,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-21 05:38:25,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.375e+01 2.599e+01 2.848e+01 3.757e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-21 05:38:27,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5105550.0, ans=0.125 2024-08-21 05:38:28,841 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:38:44,829 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-21 05:38:57,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5105650.0, ans=0.0 2024-08-21 05:38:59,460 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6800, loss[loss=0.09432, beats_loss=0.01112, ecapa_loss=0.0001334, whisper_loss=0.08187, over 19377.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01021, ecapa_loss=0.0001419, whisper_loss=0.09129, over 3835703.11 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:39:01,775 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 05:40:10,003 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5106050.0, ans=0.125 2024-08-21 05:40:18,171 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 05:40:20,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5106150.0, ans=0.1 2024-08-21 05:40:27,557 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5106150.0, ans=0.125 2024-08-21 05:40:33,859 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6850, loss[loss=0.09702, beats_loss=0.01129, ecapa_loss=0.0001063, whisper_loss=0.08466, over 13360.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001405, whisper_loss=0.09121, over 3834973.49 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:40:43,938 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-21 05:40:48,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5106250.0, ans=0.0 2024-08-21 05:40:48,889 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-21 05:41:26,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5106450.0, ans=0.125 2024-08-21 05:41:32,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.327e+01 2.654e+01 2.944e+01 2.744e+02, threshold=5.308e+01, percent-clipped=2.0 2024-08-21 05:41:43,508 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-21 05:41:45,758 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 05:41:49,112 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 05:41:51,944 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5106650.0, ans=0.2 2024-08-21 05:41:51,957 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5106650.0, ans=0.125 2024-08-21 05:41:59,770 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5106650.0, ans=0.0 2024-08-21 05:42:05,958 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6900, loss[loss=0.1143, beats_loss=0.009242, ecapa_loss=0.0001327, whisper_loss=0.1037, over 22834.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.09123, over 3795470.53 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:42:06,193 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-21 05:42:19,501 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-08-21 05:42:24,485 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.27 vs. limit=22.5 2024-08-21 05:42:26,300 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=5106850.0, ans=0.5 2024-08-21 05:42:27,327 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 05:42:29,112 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-21 05:42:49,302 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5106950.0, ans=0.125 2024-08-21 05:42:54,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5106950.0, ans=0.0 2024-08-21 05:43:10,254 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 05:43:11,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-08-21 05:43:27,295 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5107150.0, ans=0.0 2024-08-21 05:43:34,707 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5107250.0, ans=0.2 2024-08-21 05:43:35,497 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 6950, loss[loss=0.06794, beats_loss=0.01256, ecapa_loss=0.0001214, whisper_loss=0.05417, over 14312.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001389, whisper_loss=0.09107, over 3814064.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:43:37,373 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-21 05:43:50,657 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5107250.0, ans=0.125 2024-08-21 05:44:15,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5107450.0, ans=0.125 2024-08-21 05:44:19,472 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5107450.0, ans=0.125 2024-08-21 05:44:23,452 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.045e+00 2024-08-21 05:44:26,111 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5107450.0, ans=0.125 2024-08-21 05:44:32,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.277e+01 2.529e+01 2.921e+01 4.469e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 05:44:44,049 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 05:44:51,564 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 05:45:06,284 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7000, loss[loss=0.0885, beats_loss=0.01081, ecapa_loss=0.0001678, whisper_loss=0.07601, over 22107.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.0904, over 3819346.56 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:45:09,019 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:16,763 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:18,485 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5107750.0, ans=0.2 2024-08-21 05:46:08,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5108050.0, ans=0.125 2024-08-21 05:46:15,312 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5108050.0, ans=0.125 2024-08-21 05:46:22,933 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5108150.0, ans=0.125 2024-08-21 05:46:28,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5108150.0, ans=0.125 2024-08-21 05:46:29,141 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 05:46:32,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5108150.0, ans=0.2 2024-08-21 05:46:38,981 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7050, loss[loss=0.1187, beats_loss=0.009989, ecapa_loss=0.0001444, whisper_loss=0.1073, over 14559.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.08971, over 3811332.55 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:46:56,214 WARNING [optim.py:496] (2/4) Scaling gradients by 0.049437928944826126, model_norm_threshold=50.57056427001953 2024-08-21 05:46:56,390 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.007e+05, grad_sumsq=1.862e+07, orig_rms_sq=1.077e-02 2024-08-21 05:46:56,659 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 34 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 05:47:08,868 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 31 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 05:47:09,688 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5108350.0, ans=0.125 2024-08-21 05:47:12,310 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 05:47:17,295 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 05:47:28,820 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5108450.0, ans=0.0 2024-08-21 05:47:34,187 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5108550.0, ans=0.1 2024-08-21 05:47:34,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.310e+01 2.518e+01 2.782e+01 1.023e+03, threshold=5.036e+01, percent-clipped=2.0 2024-08-21 05:47:52,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5108650.0, ans=0.125 2024-08-21 05:47:57,144 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 05:47:58,434 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07700152695178986, model_norm_threshold=50.36127471923828 2024-08-21 05:47:58,602 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.694e+04, grad_sumsq=7.694e+04, orig_rms_sq=1.000e+00 2024-08-21 05:47:59,675 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-21 05:48:07,496 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7100, loss[loss=0.09949, beats_loss=0.01073, ecapa_loss=0.0001372, whisper_loss=0.0874, over 20425.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.09078, over 3817953.40 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:48:12,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5108750.0, ans=0.125 2024-08-21 05:48:19,196 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5108750.0, ans=0.2 2024-08-21 05:48:22,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5108750.0, ans=0.0 2024-08-21 05:48:34,090 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-21 05:48:45,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2024-08-21 05:48:54,336 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 05:48:57,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.18 vs. limit=22.5 2024-08-21 05:49:00,416 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=5109050.0, ans=0.2 2024-08-21 05:49:01,340 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 05:49:06,206 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5109050.0, ans=0.0 2024-08-21 05:49:09,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5109050.0, ans=0.125 2024-08-21 05:49:20,975 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 35 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-21 05:49:23,029 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5109150.0, ans=0.125 2024-08-21 05:49:36,241 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7150, loss[loss=0.1203, beats_loss=0.008215, ecapa_loss=0.0001465, whisper_loss=0.1106, over 18419.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001378, whisper_loss=0.091, over 3803759.10 frames. ], batch size: 72, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:49:40,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5109250.0, ans=0.125 2024-08-21 05:49:49,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5109250.0, ans=0.0 2024-08-21 05:50:10,204 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5109450.0, ans=0.125 2024-08-21 05:50:20,758 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 05:50:32,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.268e+01 2.455e+01 2.616e+01 6.540e+02, threshold=4.909e+01, percent-clipped=2.0 2024-08-21 05:50:41,203 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5109550.0, ans=0.125 2024-08-21 05:50:48,974 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5109650.0, ans=0.0 2024-08-21 05:50:54,325 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5109650.0, ans=0.2 2024-08-21 05:51:01,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5109650.0, ans=0.125 2024-08-21 05:51:05,958 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7200, loss[loss=0.09762, beats_loss=0.009993, ecapa_loss=0.000129, whisper_loss=0.08633, over 18050.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001381, whisper_loss=0.0903, over 3785632.66 frames. ], batch size: 72, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:51:14,939 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5109750.0, ans=0.125 2024-08-21 05:51:16,567 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5109750.0, ans=0.125 2024-08-21 05:51:26,099 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5109850.0, ans=0.125 2024-08-21 05:51:42,519 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 05:51:54,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5109950.0, ans=0.125 2024-08-21 05:52:13,526 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5110050.0, ans=0.5 2024-08-21 05:52:20,737 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5110150.0, ans=0.125 2024-08-21 05:52:24,133 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5110150.0, ans=0.1 2024-08-21 05:52:37,644 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7250, loss[loss=0.1326, beats_loss=0.007867, ecapa_loss=0.0001687, whisper_loss=0.123, over 17193.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001385, whisper_loss=0.09043, over 3814636.99 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:52:56,954 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 05:52:59,614 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-21 05:53:00,409 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 05:53:11,684 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5110350.0, ans=0.2 2024-08-21 05:53:20,468 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5110450.0, ans=0.125 2024-08-21 05:53:34,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5110550.0, ans=0.125 2024-08-21 05:53:35,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.241e+01 2.447e+01 2.768e+01 8.311e+01, threshold=4.894e+01, percent-clipped=2.0 2024-08-21 05:53:40,147 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5110550.0, ans=0.1 2024-08-21 05:53:46,311 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=12.0 2024-08-21 05:53:51,554 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-21 05:53:53,202 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5110650.0, ans=0.0 2024-08-21 05:53:54,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5110650.0, ans=0.1 2024-08-21 05:54:02,849 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 19 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-21 05:54:07,922 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7300, loss[loss=0.09114, beats_loss=0.009713, ecapa_loss=0.0001536, whisper_loss=0.07989, over 20747.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001387, whisper_loss=0.09051, over 3833498.61 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:54:42,669 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5110950.0, ans=0.125 2024-08-21 05:54:56,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5110950.0, ans=0.2 2024-08-21 05:54:57,561 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 05:54:59,675 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-21 05:55:03,686 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5111050.0, ans=0.0 2024-08-21 05:55:13,218 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 05:55:19,139 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 20 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 05:55:19,715 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5111150.0, ans=0.125 2024-08-21 05:55:24,181 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 05:55:25,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5111150.0, ans=0.125 2024-08-21 05:55:36,593 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7350, loss[loss=0.1097, beats_loss=0.009577, ecapa_loss=0.0001512, whisper_loss=0.09857, over 22774.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001372, whisper_loss=0.09047, over 3842028.61 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:55:37,566 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5111250.0, ans=0.0 2024-08-21 05:56:09,277 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 05:56:27,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5111450.0, ans=0.125 2024-08-21 05:56:27,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5111450.0, ans=0.125 2024-08-21 05:56:30,154 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 05:56:33,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.293e+01 2.578e+01 2.831e+01 4.096e+01, threshold=5.157e+01, percent-clipped=0.0 2024-08-21 05:56:42,765 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5111550.0, ans=0.1 2024-08-21 05:56:57,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5111650.0, ans=0.2 2024-08-21 05:57:04,677 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7400, loss[loss=0.0961, beats_loss=0.01265, ecapa_loss=0.0001431, whisper_loss=0.08202, over 20335.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001373, whisper_loss=0.09038, over 3790888.74 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:57:27,315 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-21 05:57:40,928 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08977154642343521, model_norm_threshold=51.56612014770508 2024-08-21 05:57:41,098 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.273e+04, grad_sumsq=4.273e+04, orig_rms_sq=1.000e+00 2024-08-21 05:57:52,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5111950.0, ans=0.125 2024-08-21 05:57:57,576 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 21 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-21 05:58:18,622 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 05:58:34,085 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7450, loss[loss=0.09805, beats_loss=0.0114, ecapa_loss=0.0001253, whisper_loss=0.0854, over 22840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.09092, over 3820875.17 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:58:38,221 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 05:59:10,875 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5112450.0, ans=0.125 2024-08-21 05:59:31,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.315e+01 2.613e+01 3.029e+01 5.744e+02, threshold=5.226e+01, percent-clipped=1.0 2024-08-21 05:59:51,883 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-08-21 05:59:57,641 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2024-08-21 06:00:03,585 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7500, loss[loss=0.1162, beats_loss=0.009213, ecapa_loss=0.0001254, whisper_loss=0.1058, over 23250.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.09061, over 3836119.01 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:00:21,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5112850.0, ans=0.125 2024-08-21 06:00:27,953 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 26 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-21 06:00:58,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5113050.0, ans=0.0 2024-08-21 06:01:11,182 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5113050.0, ans=0.125 2024-08-21 06:01:34,460 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7550, loss[loss=0.1037, beats_loss=0.01059, ecapa_loss=0.0001204, whisper_loss=0.09193, over 22417.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.09058, over 3817957.66 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 06:01:56,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5113350.0, ans=0.0 2024-08-21 06:02:18,627 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5113450.0, ans=0.2 2024-08-21 06:02:18,663 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5113450.0, ans=0.125 2024-08-21 06:02:23,787 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5113450.0, ans=0.0 2024-08-21 06:02:26,933 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 06:02:28,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5113450.0, ans=0.1 2024-08-21 06:02:36,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.241e+01 2.500e+01 2.791e+01 3.634e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-21 06:02:43,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5113550.0, ans=0.2 2024-08-21 06:02:48,972 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-21 06:02:59,568 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5113650.0, ans=0.1 2024-08-21 06:03:07,992 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7600, loss[loss=0.0969, beats_loss=0.01049, ecapa_loss=0.0001366, whisper_loss=0.08505, over 20895.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01027, ecapa_loss=0.0001381, whisper_loss=0.09201, over 3849067.26 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:03:34,896 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5113850.0, ans=0.125 2024-08-21 06:03:35,013 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5113850.0, ans=0.125 2024-08-21 06:03:39,893 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 06:03:52,902 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-21 06:03:54,942 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 06:04:09,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5114050.0, ans=0.2 2024-08-21 06:04:17,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5114050.0, ans=0.0 2024-08-21 06:04:25,323 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5114150.0, ans=0.2 2024-08-21 06:04:31,328 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5114150.0, ans=0.2 2024-08-21 06:04:31,361 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5114150.0, ans=0.125 2024-08-21 06:04:42,292 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7650, loss[loss=0.1066, beats_loss=0.01047, ecapa_loss=0.0001159, whisper_loss=0.095, over 19001.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01027, ecapa_loss=0.0001389, whisper_loss=0.0914, over 3849419.32 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:04:45,291 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5114250.0, ans=0.0 2024-08-21 06:04:55,667 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 34 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-21 06:05:01,362 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5114350.0, ans=0.125 2024-08-21 06:05:01,445 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5114350.0, ans=0.2 2024-08-21 06:05:03,863 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2024-08-21 06:05:25,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5114450.0, ans=0.2 2024-08-21 06:05:27,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5114450.0, ans=0.07 2024-08-21 06:05:42,896 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.478e+01 2.742e+01 4.351e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-21 06:05:52,534 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 06:05:52,662 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5114550.0, ans=0.125 2024-08-21 06:05:56,841 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5114650.0, ans=0.125 2024-08-21 06:06:13,472 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7700, loss[loss=0.1138, beats_loss=0.008585, ecapa_loss=0.0001418, whisper_loss=0.1038, over 19207.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001389, whisper_loss=0.09063, over 3815400.30 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:06:44,639 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5114850.0, ans=0.0 2024-08-21 06:06:59,884 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5114950.0, ans=0.2 2024-08-21 06:07:02,141 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5114950.0, ans=0.2 2024-08-21 06:07:06,654 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 06:07:38,993 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5115150.0, ans=0.0 2024-08-21 06:07:54,669 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-08-21 06:07:58,855 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7750, loss[loss=0.09145, beats_loss=0.01142, ecapa_loss=0.0001625, whisper_loss=0.0784, over 22825.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.09043, over 3801723.78 frames. ], batch size: 96, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:08:17,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5115250.0, ans=0.0 2024-08-21 06:08:25,081 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 06:08:27,367 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-21 06:08:30,826 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 06:08:40,658 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 06:08:47,512 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5115450.0, ans=0.0 2024-08-21 06:08:56,702 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5115550.0, ans=0.0 2024-08-21 06:09:03,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.270e+01 2.577e+01 2.902e+01 8.135e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-21 06:09:04,960 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 21 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-21 06:09:34,806 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7800, loss[loss=0.08791, beats_loss=0.009639, ecapa_loss=0.0001351, whisper_loss=0.07692, over 18558.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.09018, over 3797436.71 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:09:36,652 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 06:09:55,465 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 25 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 06:09:59,428 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 06:10:10,228 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5115850.0, ans=0.125 2024-08-21 06:10:24,400 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5115950.0, ans=0.125 2024-08-21 06:10:26,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5115950.0, ans=0.1 2024-08-21 06:10:44,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5116050.0, ans=0.0 2024-08-21 06:10:52,657 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2024-08-21 06:11:03,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5116150.0, ans=0.0 2024-08-21 06:11:03,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5116150.0, ans=0.0 2024-08-21 06:11:10,102 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7850, loss[loss=0.063, beats_loss=0.009339, ecapa_loss=0.0001594, whisper_loss=0.05207, over 13337.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01028, ecapa_loss=0.0001385, whisper_loss=0.09029, over 3806315.19 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:11:59,288 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 06:12:04,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5116550.0, ans=0.125 2024-08-21 06:12:04,395 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5116550.0, ans=0.2 2024-08-21 06:12:08,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.271e+01 2.419e+01 2.702e+01 3.999e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-21 06:12:11,768 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5116550.0, ans=0.0 2024-08-21 06:12:18,656 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-21 06:12:41,171 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7900, loss[loss=0.1246, beats_loss=0.008373, ecapa_loss=0.0001763, whisper_loss=0.1145, over 19994.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001384, whisper_loss=0.0901, over 3829639.57 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:13:03,946 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-21 06:13:29,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5116950.0, ans=0.125 2024-08-21 06:13:54,234 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 06:13:57,709 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 06:13:59,141 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 17 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 06:14:03,790 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-21 06:14:10,362 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 7950, loss[loss=0.102, beats_loss=0.009736, ecapa_loss=0.0001506, whisper_loss=0.09077, over 16526.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.08972, over 3855477.61 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:14:35,860 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 15 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 06:14:40,897 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 26 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-21 06:14:50,762 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-21 06:14:54,219 WARNING [optim.py:496] (2/4) Scaling gradients by 0.07848511636257172, model_norm_threshold=48.37834167480469 2024-08-21 06:14:54,386 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.905e+04, grad_sumsq=8.269e+06, orig_rms_sq=1.077e-02 2024-08-21 06:14:58,484 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5117450.0, ans=0.125 2024-08-21 06:15:06,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.240e+01 2.515e+01 2.710e+01 6.164e+02, threshold=5.030e+01, percent-clipped=3.0 2024-08-21 06:15:06,471 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 06:15:07,995 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-21 06:15:19,089 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 06:15:32,100 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 21 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-21 06:15:37,033 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8000, loss[loss=0.1025, beats_loss=0.01151, ecapa_loss=0.000113, whisper_loss=0.08991, over 23016.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.0891, over 3823563.82 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:15:40,109 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 06:15:50,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5117750.0, ans=0.125 2024-08-21 06:16:04,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5117850.0, ans=0.125 2024-08-21 06:16:37,269 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5118050.0, ans=0.1 2024-08-21 06:16:47,909 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5118150.0, ans=0.0 2024-08-21 06:17:05,643 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8050, loss[loss=0.1182, beats_loss=0.008889, ecapa_loss=0.0001251, whisper_loss=0.1081, over 24406.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001384, whisper_loss=0.08942, over 3867071.48 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:17:08,564 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5118250.0, ans=0.125 2024-08-21 06:17:22,972 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 13 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-21 06:17:26,684 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 06:17:30,736 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:17:41,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-21 06:18:00,261 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 06:18:03,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.278e+01 2.668e+01 2.870e+01 8.505e+01, threshold=5.336e+01, percent-clipped=1.0 2024-08-21 06:18:23,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5118650.0, ans=0.1 2024-08-21 06:18:34,977 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8100, loss[loss=0.1019, beats_loss=0.01153, ecapa_loss=0.0001353, whisper_loss=0.08906, over 16989.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.08975, over 3846469.72 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:19:04,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=15.0 2024-08-21 06:19:14,874 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5118950.0, ans=0.125 2024-08-21 06:19:18,337 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5118950.0, ans=0.0 2024-08-21 06:19:33,754 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5119050.0, ans=0.0 2024-08-21 06:19:38,457 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 06:19:48,669 WARNING [optim.py:496] (2/4) Scaling gradients by 0.08069697767496109, model_norm_threshold=53.36321258544922 2024-08-21 06:19:48,833 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.079e+04, grad_sumsq=7.079e+04, orig_rms_sq=1.000e+00 2024-08-21 06:19:50,916 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5119150.0, ans=0.0 2024-08-21 06:19:56,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5119150.0, ans=0.0 2024-08-21 06:20:02,406 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8150, loss[loss=0.1048, beats_loss=0.008332, ecapa_loss=0.0001463, whisper_loss=0.09499, over 14083.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001379, whisper_loss=0.09034, over 3851993.13 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:20:17,531 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-21 06:20:33,311 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-21 06:20:43,503 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 06:20:44,774 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-21 06:20:58,316 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.296e+01 2.463e+01 2.711e+01 6.613e+02, threshold=4.926e+01, percent-clipped=2.0 2024-08-21 06:21:08,347 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5119550.0, ans=0.1 2024-08-21 06:21:11,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5119650.0, ans=0.2 2024-08-21 06:21:14,114 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-08-21 06:21:27,162 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5119750.0, ans=0.125 2024-08-21 06:21:27,802 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8200, loss[loss=0.09365, beats_loss=0.01022, ecapa_loss=0.0001588, whisper_loss=0.08184, over 21875.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.000138, whisper_loss=0.08992, over 3798963.51 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:21:30,852 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5119750.0, ans=0.125 2024-08-21 06:22:15,086 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-21 06:22:27,918 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5120050.0, ans=0.0 2024-08-21 06:22:36,174 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 06:22:49,625 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 9 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-21 06:22:50,331 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:22:57,646 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8250, loss[loss=0.1065, beats_loss=0.01034, ecapa_loss=0.0001777, whisper_loss=0.09434, over 23231.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08989, over 3830078.34 frames. ], batch size: 96, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:23:13,599 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2024-08-21 06:23:54,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.309e+01 2.543e+01 2.823e+01 1.094e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 06:24:25,024 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8300, loss[loss=0.1188, beats_loss=0.009455, ecapa_loss=0.0001666, whisper_loss=0.1077, over 19459.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08968, over 3829621.59 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:24:41,219 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5120750.0, ans=0.0 2024-08-21 06:24:45,955 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 39 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-21 06:24:49,291 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 06:24:53,618 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5120850.0, ans=0.0 2024-08-21 06:24:54,550 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 06:25:20,463 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5121050.0, ans=0.125 2024-08-21 06:25:23,489 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 19 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-21 06:25:23,880 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5121050.0, ans=0.0 2024-08-21 06:25:29,915 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 06:25:34,225 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 06:25:35,467 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2024-08-21 06:25:36,773 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5121150.0, ans=0.2 2024-08-21 06:25:55,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5121250.0, ans=0.0 2024-08-21 06:25:56,026 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8350, loss[loss=0.09073, beats_loss=0.01106, ecapa_loss=0.0001732, whisper_loss=0.07794, over 20288.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.08989, over 3828816.48 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:25:59,498 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2024-08-21 06:26:13,222 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5121250.0, ans=0.125 2024-08-21 06:26:17,746 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 06:26:43,008 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5121450.0, ans=0.125 2024-08-21 06:26:51,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5121450.0, ans=0.0 2024-08-21 06:26:55,134 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-21 06:26:55,617 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0374552384018898, model_norm_threshold=50.851959228515625 2024-08-21 06:26:55,783 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.895e+05, grad_sumsq=1.757e+07, orig_rms_sq=1.078e-02 2024-08-21 06:26:59,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.288e+01 2.464e+01 2.782e+01 1.358e+03, threshold=4.928e+01, percent-clipped=2.0 2024-08-21 06:27:12,577 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-21 06:27:33,207 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8400, loss[loss=0.1111, beats_loss=0.00991, ecapa_loss=0.0001617, whisper_loss=0.09957, over 19114.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09024, over 3867232.76 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:27:36,420 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-08-21 06:27:43,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5121750.0, ans=0.0 2024-08-21 06:28:15,528 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-08-21 06:28:34,491 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 06:28:42,060 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 06:29:06,828 WARNING [optim.py:496] (2/4) Scaling gradients by 0.038237348198890686, model_norm_threshold=49.277313232421875 2024-08-21 06:29:06,996 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.805e+05, grad_sumsq=1.805e+05, orig_rms_sq=1.000e+00 2024-08-21 06:29:07,039 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8450, loss[loss=0.1253, beats_loss=0.02229, ecapa_loss=0.0001586, whisper_loss=0.1014, over 21445.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001387, whisper_loss=0.08961, over 3870542.17 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:29:19,003 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 06:29:36,833 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5122350.0, ans=0.125 2024-08-21 06:30:02,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=5122450.0, ans=0.05 2024-08-21 06:30:03,067 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2024-08-21 06:30:05,690 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 06:30:07,353 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5122550.0, ans=0.125 2024-08-21 06:30:11,447 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-21 06:30:11,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.430e+01 2.632e+01 3.117e+01 1.289e+03, threshold=5.264e+01, percent-clipped=4.0 2024-08-21 06:30:19,507 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:30:41,110 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5122650.0, ans=0.125 2024-08-21 06:30:46,101 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8500, loss[loss=0.09317, beats_loss=0.01113, ecapa_loss=0.0001814, whisper_loss=0.08023, over 20405.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08976, over 3851764.56 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:30:49,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5122750.0, ans=0.2 2024-08-21 06:31:07,350 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5122850.0, ans=0.035 2024-08-21 06:31:36,062 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-21 06:31:51,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5123050.0, ans=0.1 2024-08-21 06:31:53,723 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 25 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-21 06:32:10,051 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5123150.0, ans=0.125 2024-08-21 06:32:24,512 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8550, loss[loss=0.09039, beats_loss=0.01258, ecapa_loss=0.000186, whisper_loss=0.07596, over 15633.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08961, over 3846429.39 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:32:30,011 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5123250.0, ans=0.125 2024-08-21 06:33:17,431 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 15 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-21 06:33:19,281 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 06:33:21,363 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 06:33:28,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-21 06:33:30,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.375e+01 2.634e+01 2.950e+01 1.431e+02, threshold=5.267e+01, percent-clipped=1.0 2024-08-21 06:33:47,534 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5123650.0, ans=15.0 2024-08-21 06:34:04,711 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8600, loss[loss=0.08414, beats_loss=0.009265, ecapa_loss=0.0001379, whisper_loss=0.0735, over 16680.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.0898, over 3807112.72 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:34:13,795 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-08-21 06:34:19,264 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5123750.0, ans=0.125 2024-08-21 06:34:20,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5123750.0, ans=0.0 2024-08-21 06:34:25,028 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 06:34:28,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5123850.0, ans=0.125 2024-08-21 06:34:50,381 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 06:34:57,367 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2024-08-21 06:35:14,293 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-21 06:35:18,525 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.77 vs. limit=22.5 2024-08-21 06:35:19,650 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5124050.0, ans=0.0 2024-08-21 06:35:43,142 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8650, loss[loss=0.08745, beats_loss=0.01088, ecapa_loss=0.000175, whisper_loss=0.07482, over 21192.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.08981, over 3822549.67 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:36:03,338 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-08-21 06:36:10,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5124350.0, ans=0.0 2024-08-21 06:36:39,950 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-08-21 06:36:42,172 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 06:36:43,120 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=5124550.0, ans=10.0 2024-08-21 06:36:43,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5124550.0, ans=0.125 2024-08-21 06:36:47,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.320e+01 2.649e+01 2.917e+01 4.406e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-21 06:37:02,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-21 06:37:24,441 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8700, loss[loss=0.09587, beats_loss=0.01064, ecapa_loss=0.0001049, whisper_loss=0.08418, over 17976.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.0001394, whisper_loss=0.09061, over 3811306.31 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:37:27,892 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5124750.0, ans=0.2 2024-08-21 06:37:55,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5124850.0, ans=0.125 2024-08-21 06:38:10,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5124950.0, ans=0.1 2024-08-21 06:38:12,355 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5124950.0, ans=0.07 2024-08-21 06:38:18,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5124950.0, ans=0.125 2024-08-21 06:38:50,121 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 06:39:00,040 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8750, loss[loss=0.1192, beats_loss=0.009879, ecapa_loss=0.0001363, whisper_loss=0.1079, over 23224.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.0001397, whisper_loss=0.0909, over 3822798.62 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:39:12,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5125250.0, ans=0.0 2024-08-21 06:39:16,479 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-21 06:39:18,736 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 06:39:32,154 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5125350.0, ans=0.125 2024-08-21 06:39:50,637 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5125450.0, ans=0.125 2024-08-21 06:40:01,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.240e+01 2.524e+01 2.807e+01 1.444e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-21 06:40:02,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5125550.0, ans=0.125 2024-08-21 06:40:28,667 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-21 06:40:28,919 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-08-21 06:40:34,711 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8800, loss[loss=0.1078, beats_loss=0.01215, ecapa_loss=0.0001161, whisper_loss=0.09453, over 17954.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001396, whisper_loss=0.09096, over 3815342.13 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:40:37,184 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5125750.0, ans=0.0 2024-08-21 06:40:55,031 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 06:40:58,225 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 12 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 06:40:59,978 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 06:41:02,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5125850.0, ans=0.0 2024-08-21 06:41:11,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5125950.0, ans=0.125 2024-08-21 06:41:28,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5126050.0, ans=0.125 2024-08-21 06:41:37,283 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5126050.0, ans=0.1 2024-08-21 06:41:56,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5126150.0, ans=0.125 2024-08-21 06:42:04,617 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8850, loss[loss=0.1027, beats_loss=0.008207, ecapa_loss=0.0001694, whisper_loss=0.09284, over 20582.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.08928, over 3810106.93 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:42:07,529 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:11,816 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5126250.0, ans=0.0 2024-08-21 06:42:12,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2024-08-21 06:42:16,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:43:11,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.228e+01 2.528e+01 2.793e+01 5.836e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-21 06:43:38,797 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5126650.0, ans=0.125 2024-08-21 06:43:38,846 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5126650.0, ans=0.0 2024-08-21 06:43:39,842 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 06:43:41,833 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 06:43:45,885 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8900, loss[loss=0.09016, beats_loss=0.01065, ecapa_loss=0.0001148, whisper_loss=0.07837, over 15588.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001391, whisper_loss=0.0895, over 3824344.27 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:43:46,137 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-21 06:43:52,949 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5126750.0, ans=0.125 2024-08-21 06:44:03,709 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=22.5 2024-08-21 06:44:08,409 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 06:44:26,061 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5126950.0, ans=0.0 2024-08-21 06:44:43,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5127050.0, ans=0.0 2024-08-21 06:44:43,369 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=12.0 2024-08-21 06:45:05,006 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5127150.0, ans=0.2 2024-08-21 06:45:06,260 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 25 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-21 06:45:18,235 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-21 06:45:18,665 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 8950, loss[loss=0.07377, beats_loss=0.01265, ecapa_loss=0.0001198, whisper_loss=0.05992, over 13864.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.000139, whisper_loss=0.08986, over 3832307.00 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:45:25,760 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5127250.0, ans=0.125 2024-08-21 06:45:26,184 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-08-21 06:45:36,546 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5127250.0, ans=0.2 2024-08-21 06:45:54,780 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5127350.0, ans=0.125 2024-08-21 06:46:20,256 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 27 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-21 06:46:23,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.262e+01 2.428e+01 2.785e+01 3.880e+01, threshold=4.857e+01, percent-clipped=0.0 2024-08-21 06:46:24,556 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5127550.0, ans=0.1 2024-08-21 06:46:32,140 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 06:46:56,919 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9000, loss[loss=0.1041, beats_loss=0.009623, ecapa_loss=0.0001287, whisper_loss=0.09317, over 23094.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001383, whisper_loss=0.08998, over 3849349.26 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:46:56,919 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-21 06:47:34,641 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005065, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 06:47:57,451 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on SV_voxceleb1: loss=0.003886, beats_loss=0, ecapa_loss=0.0003886, whisper_loss=0, over 944235.00 frames. 2024-08-21 06:49:39,589 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on AT_audioset: loss=0.02296, beats_loss=0.02296, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 06:49:39,597 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-21 06:49:59,260 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-21 06:50:01,076 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5127850.0, ans=0.125 2024-08-21 06:50:09,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5127850.0, ans=0.125 2024-08-21 06:50:16,287 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5127950.0, ans=0.0 2024-08-21 06:50:17,565 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5127950.0, ans=0.1 2024-08-21 06:50:18,649 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 10 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 06:50:46,102 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=12.0 2024-08-21 06:50:56,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5128150.0, ans=0.125 2024-08-21 06:51:08,176 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9050, loss[loss=0.09118, beats_loss=0.01259, ecapa_loss=0.0001036, whisper_loss=0.07756, over 18683.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001376, whisper_loss=0.0898, over 3826925.59 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:51:14,541 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5128250.0, ans=0.125 2024-08-21 06:51:26,063 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-21 06:51:27,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5128350.0, ans=0.2 2024-08-21 06:51:36,549 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5128350.0, ans=0.0 2024-08-21 06:51:43,925 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-21 06:51:44,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5128350.0, ans=0.2 2024-08-21 06:51:55,736 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5128450.0, ans=0.125 2024-08-21 06:52:03,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5128450.0, ans=0.0 2024-08-21 06:52:03,795 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5128450.0, ans=0.1 2024-08-21 06:52:07,335 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5128550.0, ans=0.0 2024-08-21 06:52:12,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.208e+01 2.419e+01 2.777e+01 1.932e+02, threshold=4.839e+01, percent-clipped=1.0 2024-08-21 06:52:15,095 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5128550.0, ans=0.09899494936611666 2024-08-21 06:52:23,131 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 06:52:33,312 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 06:52:41,764 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9100, loss[loss=0.1134, beats_loss=0.008474, ecapa_loss=0.0001447, whisper_loss=0.1035, over 22138.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.09043, over 3812166.16 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:52:41,992 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 28 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 06:53:09,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5128850.0, ans=0.125 2024-08-21 06:53:10,262 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.08 vs. limit=22.5 2024-08-21 06:53:16,068 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5128850.0, ans=0.0 2024-08-21 06:53:17,126 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-21 06:53:26,471 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5128950.0, ans=0.1 2024-08-21 06:54:15,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9150, loss[loss=0.1086, beats_loss=0.01257, ecapa_loss=0.0001309, whisper_loss=0.09475, over 14052.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001377, whisper_loss=0.09003, over 3805329.75 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:54:23,520 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5129250.0, ans=0.0 2024-08-21 06:54:29,443 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-21 06:54:41,849 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5129350.0, ans=0.125 2024-08-21 06:54:42,876 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 12 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-21 06:54:43,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5129350.0, ans=0.0 2024-08-21 06:54:47,263 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5129350.0, ans=0.0 2024-08-21 06:54:48,220 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 06:55:06,792 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-21 06:55:07,685 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5129450.0, ans=0.125 2024-08-21 06:55:08,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=5129450.0, ans=15.0 2024-08-21 06:55:16,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.222e+01 2.469e+01 2.826e+01 4.057e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-21 06:55:41,103 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5129650.0, ans=0.0 2024-08-21 06:55:46,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5129750.0, ans=0.125 2024-08-21 06:55:46,854 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9200, loss[loss=0.09017, beats_loss=0.01163, ecapa_loss=0.0001736, whisper_loss=0.0768, over 14942.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001378, whisper_loss=0.0893, over 3810493.42 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:56:13,988 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5129850.0, ans=0.2 2024-08-21 06:56:18,321 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5129850.0, ans=0.05 2024-08-21 06:56:18,331 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5129850.0, ans=0.0 2024-08-21 06:56:53,066 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2024-08-21 06:57:13,434 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5130150.0, ans=0.2 2024-08-21 06:57:16,330 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 06:57:22,055 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9250, loss[loss=0.1216, beats_loss=0.007954, ecapa_loss=0.0001342, whisper_loss=0.1123, over 15276.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001378, whisper_loss=0.08979, over 3820414.46 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:57:46,255 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.11 vs. limit=15.0 2024-08-21 06:58:14,275 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 06:58:23,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.254e+01 2.574e+01 2.946e+01 4.918e+02, threshold=5.149e+01, percent-clipped=3.0 2024-08-21 06:58:26,936 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-08-21 06:58:34,129 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5130550.0, ans=0.125 2024-08-21 06:58:34,143 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5130550.0, ans=0.0 2024-08-21 06:58:44,876 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 18 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-21 06:58:50,712 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 12 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-21 06:58:58,622 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9300, loss[loss=0.09846, beats_loss=0.0108, ecapa_loss=0.0001469, whisper_loss=0.08619, over 19479.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.000139, whisper_loss=0.09005, over 3820744.49 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:59:10,358 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 06:59:19,333 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5130850.0, ans=0.125 2024-08-21 06:59:21,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5130850.0, ans=0.1 2024-08-21 06:59:36,986 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-21 06:59:43,254 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-21 07:00:22,448 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5131150.0, ans=0.125 2024-08-21 07:00:25,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5131150.0, ans=0.015 2024-08-21 07:00:27,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5131150.0, ans=0.125 2024-08-21 07:00:33,903 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9350, loss[loss=0.1144, beats_loss=0.009025, ecapa_loss=0.0001601, whisper_loss=0.1038, over 14847.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.000138, whisper_loss=0.08953, over 3844462.41 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:00:40,976 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5131250.0, ans=0.0 2024-08-21 07:00:45,945 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5131250.0, ans=0.125 2024-08-21 07:00:54,179 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-21 07:01:05,647 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-08-21 07:01:07,064 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2024-08-21 07:01:24,343 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 07:01:32,100 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 07:01:35,568 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.276e+01 2.548e+01 2.859e+01 2.021e+02, threshold=5.096e+01, percent-clipped=1.0 2024-08-21 07:01:36,599 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.899e+05 2024-08-21 07:01:36,767 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-21 07:01:51,290 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5131650.0, ans=0.1 2024-08-21 07:01:52,235 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 16 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 07:02:07,480 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9400, loss[loss=0.1085, beats_loss=0.009512, ecapa_loss=0.0001327, whisper_loss=0.09767, over 16737.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001398, whisper_loss=0.08936, over 3835102.14 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:02:10,530 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-08-21 07:02:16,486 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-21 07:02:33,667 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 19 from LS+wenet, 17 from Vox, 53 fro AS 2024-08-21 07:03:03,590 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5132050.0, ans=0.125 2024-08-21 07:03:31,757 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=22.5 2024-08-21 07:03:34,451 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 07:03:40,015 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9450, loss[loss=0.09105, beats_loss=0.01113, ecapa_loss=0.0001012, whisper_loss=0.07891, over 15389.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001401, whisper_loss=0.08909, over 3842118.19 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:03:50,344 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5132250.0, ans=0.0 2024-08-21 07:04:34,847 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5132550.0, ans=0.1 2024-08-21 07:04:40,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.244e+01 2.507e+01 2.864e+01 1.489e+02, threshold=5.014e+01, percent-clipped=2.0 2024-08-21 07:04:51,213 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 07:04:59,422 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5132650.0, ans=0.125 2024-08-21 07:05:07,374 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.34 vs. limit=10.0 2024-08-21 07:05:13,303 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9500, loss[loss=0.1016, beats_loss=0.01003, ecapa_loss=0.0001481, whisper_loss=0.09006, over 21293.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001406, whisper_loss=0.0893, over 3815781.52 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:05:35,095 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.99 vs. limit=22.5 2024-08-21 07:05:43,700 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-21 07:05:44,615 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 07:05:57,101 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5132950.0, ans=0.0 2024-08-21 07:05:59,412 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5132950.0, ans=0.0 2024-08-21 07:06:14,553 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5133050.0, ans=0.125 2024-08-21 07:06:16,771 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-08-21 07:06:17,351 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 07:06:43,972 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 07:06:44,799 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-08-21 07:06:46,670 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-08-21 07:06:47,323 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9550, loss[loss=0.09347, beats_loss=0.0095, ecapa_loss=0.0001514, whisper_loss=0.08245, over 19468.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001395, whisper_loss=0.08952, over 3818801.70 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:06:55,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5133250.0, ans=0.2 2024-08-21 07:07:03,043 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 07:07:07,911 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 07:07:19,512 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 27 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-21 07:07:29,495 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-21 07:07:49,019 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:07:49,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.309e+01 2.529e+01 2.824e+01 3.800e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 07:07:51,043 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5133550.0, ans=0.125 2024-08-21 07:08:04,137 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5133650.0, ans=0.125 2024-08-21 07:08:13,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5133650.0, ans=0.125 2024-08-21 07:08:18,784 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5133750.0, ans=0.0 2024-08-21 07:08:19,678 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9600, loss[loss=0.1292, beats_loss=0.009067, ecapa_loss=0.0001142, whisper_loss=0.119, over 18800.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001398, whisper_loss=0.08941, over 3831800.21 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:08:20,690 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5133750.0, ans=0.125 2024-08-21 07:08:39,348 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-21 07:08:42,610 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=5133850.0, ans=0.025 2024-08-21 07:08:54,025 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5133850.0, ans=0.0 2024-08-21 07:09:02,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5133950.0, ans=0.125 2024-08-21 07:09:26,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5134050.0, ans=0.125 2024-08-21 07:09:37,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5134150.0, ans=0.0 2024-08-21 07:09:43,145 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 07:09:45,411 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 07:09:48,713 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9650, loss[loss=0.09938, beats_loss=0.009531, ecapa_loss=0.0001785, whisper_loss=0.08806, over 19709.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08939, over 3788034.66 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:09:57,212 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 07:10:09,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5134350.0, ans=0.125 2024-08-21 07:10:11,584 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=5134350.0, ans=12.0 2024-08-21 07:10:16,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5134350.0, ans=0.125 2024-08-21 07:10:40,235 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5134450.0, ans=0.0 2024-08-21 07:10:49,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.246e+01 2.506e+01 2.851e+01 2.599e+02, threshold=5.012e+01, percent-clipped=4.0 2024-08-21 07:11:08,128 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2024-08-21 07:11:09,693 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5134650.0, ans=0.0 2024-08-21 07:11:17,957 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-21 07:11:19,431 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9700, loss[loss=0.1009, beats_loss=0.009313, ecapa_loss=0.0001417, whisper_loss=0.09019, over 16138.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.08952, over 3802416.04 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:11:46,088 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5134850.0, ans=0.1 2024-08-21 07:12:05,205 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 07:12:50,862 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9750, loss[loss=0.08313, beats_loss=0.01217, ecapa_loss=0.0001379, whisper_loss=0.06959, over 21799.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.08882, over 3794056.07 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:12:58,024 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 07:12:58,530 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5135250.0, ans=0.0 2024-08-21 07:13:01,544 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 07:13:38,689 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5135450.0, ans=0.0 2024-08-21 07:13:51,782 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5135550.0, ans=0.0 2024-08-21 07:13:52,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.258e+01 2.458e+01 2.685e+01 1.396e+02, threshold=4.917e+01, percent-clipped=1.0 2024-08-21 07:14:10,845 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-21 07:14:13,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5135650.0, ans=0.0 2024-08-21 07:14:20,951 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9800, loss[loss=0.1044, beats_loss=0.01115, ecapa_loss=0.0001125, whisper_loss=0.0921, over 19134.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01051, ecapa_loss=0.0001387, whisper_loss=0.08812, over 3783157.50 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:14:22,407 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5135750.0, ans=0.2 2024-08-21 07:14:34,071 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5135750.0, ans=0.125 2024-08-21 07:14:35,544 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 07:15:08,687 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5135950.0, ans=0.0 2024-08-21 07:15:21,578 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5136050.0, ans=0.0 2024-08-21 07:15:34,934 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 07:15:38,743 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5136150.0, ans=0.0 2024-08-21 07:15:38,776 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5136150.0, ans=0.0 2024-08-21 07:15:54,757 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9850, loss[loss=0.08817, beats_loss=0.009329, ecapa_loss=0.000149, whisper_loss=0.07735, over 14859.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001384, whisper_loss=0.08915, over 3785296.40 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:15:57,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5136250.0, ans=0.0 2024-08-21 07:16:29,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-21 07:16:35,972 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 07:16:56,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5136550.0, ans=0.125 2024-08-21 07:17:00,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.250e+01 2.454e+01 2.726e+01 7.431e+01, threshold=4.908e+01, percent-clipped=3.0 2024-08-21 07:17:16,862 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5136650.0, ans=0.125 2024-08-21 07:17:23,496 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5136650.0, ans=0.125 2024-08-21 07:17:33,749 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9900, loss[loss=0.08649, beats_loss=0.01173, ecapa_loss=0.0001071, whisper_loss=0.07369, over 23373.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.08896, over 3760759.25 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:18:02,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5136850.0, ans=0.0 2024-08-21 07:18:05,639 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 07:18:14,818 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5136950.0, ans=0.07 2024-08-21 07:18:16,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-21 07:18:19,264 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 07:18:21,688 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-21 07:18:23,332 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 22 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-21 07:18:55,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5137150.0, ans=0.125 2024-08-21 07:19:04,893 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5137150.0, ans=0.125 2024-08-21 07:19:07,790 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 9950, loss[loss=0.1004, beats_loss=0.01022, ecapa_loss=0.0001357, whisper_loss=0.08887, over 23185.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01046, ecapa_loss=0.000138, whisper_loss=0.08857, over 3778015.46 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:19:23,183 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 07:19:25,107 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-21 07:19:27,939 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2024-08-21 07:19:32,503 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 18 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-21 07:19:33,183 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5137350.0, ans=0.125 2024-08-21 07:19:43,052 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 07:19:43,843 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5137450.0, ans=0.125 2024-08-21 07:19:51,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5137450.0, ans=0.125 2024-08-21 07:19:56,690 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-21 07:20:10,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5137550.0, ans=0.125 2024-08-21 07:20:11,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.331e+01 2.493e+01 2.737e+01 3.742e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-21 07:20:16,195 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-21 07:20:33,316 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 16 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 07:20:40,315 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10000, loss[loss=0.07607, beats_loss=0.01116, ecapa_loss=0.000111, whisper_loss=0.0638, over 15302.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01046, ecapa_loss=0.0001368, whisper_loss=0.08902, over 3766009.68 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:20:51,472 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 07:21:14,878 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5137850.0, ans=0.0 2024-08-21 07:21:19,377 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 07:21:36,551 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5138050.0, ans=0.1 2024-08-21 07:21:38,139 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5138050.0, ans=0.04949747468305833 2024-08-21 07:21:40,349 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5138050.0, ans=0.0 2024-08-21 07:21:48,507 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 17 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-21 07:22:14,657 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10050, loss[loss=0.0898, beats_loss=0.01311, ecapa_loss=0.0001196, whisper_loss=0.07549, over 22976.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001379, whisper_loss=0.08994, over 3766420.40 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:22:32,937 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 07:22:42,019 WARNING [optim.py:496] (2/4) Scaling gradients by 0.01775754615664482, model_norm_threshold=49.858680725097656 2024-08-21 07:22:42,192 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.817e+06, grad_sumsq=1.684e+08, orig_rms_sq=1.079e-02 2024-08-21 07:22:49,215 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5138350.0, ans=0.125 2024-08-21 07:22:53,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5138350.0, ans=0.0 2024-08-21 07:23:24,475 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-21 07:23:27,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.254e+01 2.564e+01 3.028e+01 2.808e+03, threshold=5.129e+01, percent-clipped=1.0 2024-08-21 07:23:30,276 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-21 07:24:02,642 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10100, loss[loss=0.08039, beats_loss=0.011, ecapa_loss=0.0001717, whisper_loss=0.06768, over 16668.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.0901, over 3804471.59 frames. ], batch size: 72, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:24:15,954 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5138750.0, ans=0.125 2024-08-21 07:24:18,173 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5138750.0, ans=0.1 2024-08-21 07:24:22,311 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5138850.0, ans=0.0 2024-08-21 07:24:24,124 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.685e-01 2024-08-21 07:24:31,894 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5138850.0, ans=0.0 2024-08-21 07:24:42,229 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5138950.0, ans=0.1 2024-08-21 07:24:46,729 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 07:25:01,980 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 31 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 07:25:33,933 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 16 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-21 07:25:36,846 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10150, loss[loss=0.1333, beats_loss=0.007508, ecapa_loss=0.0001514, whisper_loss=0.1243, over 15458.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09075, over 3791632.63 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:25:41,402 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5139250.0, ans=0.125 2024-08-21 07:25:43,377 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=5139250.0, ans=0.05 2024-08-21 07:25:44,337 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 21 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-21 07:26:11,351 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 07:26:36,707 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 07:26:44,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.296e+01 2.505e+01 2.874e+01 3.996e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-21 07:26:48,739 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 24 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-21 07:26:49,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5139550.0, ans=0.125 2024-08-21 07:26:55,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5139650.0, ans=0.2 2024-08-21 07:26:59,924 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-21 07:27:05,225 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5139650.0, ans=0.0 2024-08-21 07:27:15,373 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10200, loss[loss=0.1056, beats_loss=0.009284, ecapa_loss=0.0001374, whisper_loss=0.0949, over 19992.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001388, whisper_loss=0.08999, over 3801983.13 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:27:19,047 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2024-08-21 07:27:30,820 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 07:27:31,673 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5139750.0, ans=0.125 2024-08-21 07:27:41,272 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5139850.0, ans=0.125 2024-08-21 07:28:06,598 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 24 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 07:28:29,968 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5140150.0, ans=0.1 2024-08-21 07:28:34,990 WARNING [optim.py:496] (2/4) Scaling gradients by 0.040334705263376236, model_norm_threshold=50.09689712524414 2024-08-21 07:28:35,162 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.841e+05, grad_sumsq=1.841e+05, orig_rms_sq=1.000e+00 2024-08-21 07:28:46,823 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-21 07:28:50,994 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10250, loss[loss=0.1011, beats_loss=0.01103, ecapa_loss=0.0001672, whisper_loss=0.08838, over 20643.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.08981, over 3817821.52 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:28:58,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5140250.0, ans=0.0 2024-08-21 07:29:02,753 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5140250.0, ans=0.125 2024-08-21 07:29:10,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5140350.0, ans=0.125 2024-08-21 07:29:18,720 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 07:29:35,797 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 07:29:44,312 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 07:29:47,026 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5140550.0, ans=0.125 2024-08-21 07:29:56,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.288e+01 2.559e+01 2.960e+01 1.242e+03, threshold=5.118e+01, percent-clipped=2.0 2024-08-21 07:30:05,946 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5140550.0, ans=0.2 2024-08-21 07:30:11,418 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:30:17,570 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5140650.0, ans=0.025 2024-08-21 07:30:28,604 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10300, loss[loss=0.1016, beats_loss=0.01076, ecapa_loss=0.0001362, whisper_loss=0.08947, over 21078.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.08943, over 3861270.69 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:30:49,086 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5140750.0, ans=0.2 2024-08-21 07:30:57,696 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 07:31:19,447 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5140950.0, ans=0.1 2024-08-21 07:31:50,851 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 07:32:24,007 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5141250.0, ans=0.125 2024-08-21 07:32:24,715 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10350, loss[loss=0.1013, beats_loss=0.009317, ecapa_loss=0.0001357, whisper_loss=0.09065, over 20548.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001388, whisper_loss=0.08985, over 3876677.44 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:32:25,982 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5141250.0, ans=0.125 2024-08-21 07:32:34,592 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 07:32:41,870 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5141250.0, ans=0.125 2024-08-21 07:33:07,499 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5141450.0, ans=0.2 2024-08-21 07:33:32,181 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5141550.0, ans=0.0 2024-08-21 07:33:35,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.267e+01 2.630e+01 2.969e+01 5.000e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-21 07:33:38,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5141550.0, ans=0.125 2024-08-21 07:33:47,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5141650.0, ans=0.125 2024-08-21 07:34:03,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5141650.0, ans=0.2 2024-08-21 07:34:08,029 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10400, loss[loss=0.08944, beats_loss=0.01167, ecapa_loss=0.0001409, whisper_loss=0.07636, over 19036.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001379, whisper_loss=0.08936, over 3880123.46 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:34:27,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5141850.0, ans=0.125 2024-08-21 07:34:29,516 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5141850.0, ans=0.1 2024-08-21 07:34:41,020 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:34:59,305 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2024-08-21 07:35:26,235 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 07:35:27,704 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2024-08-21 07:35:33,934 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-21 07:35:39,266 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 07:35:54,748 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10450, loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001224, whisper_loss=0.09191, over 22776.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.08908, over 3904520.71 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:35:56,177 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 07:36:31,802 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5142350.0, ans=0.04949747468305833 2024-08-21 07:36:44,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5142450.0, ans=0.0 2024-08-21 07:37:18,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.371e+01 2.736e+01 3.043e+01 5.041e+02, threshold=5.472e+01, percent-clipped=3.0 2024-08-21 07:37:22,733 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5142550.0, ans=0.125 2024-08-21 07:37:42,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5142650.0, ans=0.0 2024-08-21 07:37:44,792 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=22.5 2024-08-21 07:37:53,570 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10500, loss[loss=0.07588, beats_loss=0.01148, ecapa_loss=0.0001485, whisper_loss=0.06291, over 20073.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08892, over 3890402.04 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:37:58,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5142750.0, ans=0.125 2024-08-21 07:38:09,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5142750.0, ans=0.1 2024-08-21 07:38:36,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5142850.0, ans=0.125 2024-08-21 07:38:37,534 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 07:38:57,504 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5142950.0, ans=0.95 2024-08-21 07:39:02,480 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-21 07:39:06,032 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-21 07:39:15,918 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.573e-02 2024-08-21 07:39:41,921 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10550, loss[loss=0.106, beats_loss=0.009396, ecapa_loss=0.0001487, whisper_loss=0.09516, over 16310.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.0894, over 3901493.69 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:39:44,943 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=5143250.0, ans=0.02 2024-08-21 07:39:49,308 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5143250.0, ans=0.125 2024-08-21 07:40:01,273 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 16 from LS+wenet, 25 from Vox, 15 fro AS 2024-08-21 07:40:13,938 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-21 07:40:21,094 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 07:40:26,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5143450.0, ans=0.125 2024-08-21 07:40:50,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.474e+01 2.751e+01 3.009e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-21 07:41:16,375 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5143650.0, ans=0.2 2024-08-21 07:41:21,520 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10600, loss[loss=0.1092, beats_loss=0.008314, ecapa_loss=0.0001243, whisper_loss=0.09967, over 19635.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.0893, over 3874183.31 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:41:27,165 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 07:41:33,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5143750.0, ans=0.0 2024-08-21 07:41:33,288 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5143750.0, ans=0.125 2024-08-21 07:42:00,458 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-21 07:42:10,230 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5143950.0, ans=0.125 2024-08-21 07:42:36,364 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.98 vs. limit=10.0 2024-08-21 07:42:52,153 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5144150.0, ans=0.05 2024-08-21 07:42:55,661 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10650, loss[loss=0.1134, beats_loss=0.009158, ecapa_loss=0.0001467, whisper_loss=0.1027, over 13946.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.0885, over 3859711.11 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:43:09,486 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-21 07:43:13,447 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 23 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-21 07:43:39,346 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5144450.0, ans=0.1 2024-08-21 07:43:46,306 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 07:43:54,326 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 07:44:04,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.276e+01 2.540e+01 2.903e+01 1.576e+02, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 07:44:33,454 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10700, loss[loss=0.1104, beats_loss=0.009015, ecapa_loss=0.000134, whisper_loss=0.1001, over 21884.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001378, whisper_loss=0.08895, over 3875898.88 frames. ], batch size: 85, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:44:36,033 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-21 07:45:09,299 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=5144850.0, ans=0.05 2024-08-21 07:45:09,354 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5144850.0, ans=0.2 2024-08-21 07:45:10,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5144850.0, ans=0.125 2024-08-21 07:45:13,750 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2024-08-21 07:45:35,280 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-21 07:45:37,714 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 20 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-21 07:45:47,117 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 07:45:53,352 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5145150.0, ans=0.125 2024-08-21 07:45:55,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=5145150.0, ans=6.0 2024-08-21 07:46:09,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5145250.0, ans=0.2 2024-08-21 07:46:10,561 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10750, loss[loss=0.09854, beats_loss=0.008646, ecapa_loss=0.0001935, whisper_loss=0.08795, over 12090.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001373, whisper_loss=0.08926, over 3868019.55 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:46:11,359 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5145250.0, ans=0.0 2024-08-21 07:46:25,099 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 07:47:16,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.270e+01 2.527e+01 2.757e+01 4.165e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-21 07:47:32,761 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.29 vs. limit=10.0 2024-08-21 07:47:33,674 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5145650.0, ans=0.125 2024-08-21 07:47:47,532 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10800, loss[loss=0.1057, beats_loss=0.01071, ecapa_loss=0.0001202, whisper_loss=0.09379, over 23185.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01052, ecapa_loss=0.0001361, whisper_loss=0.08884, over 3863259.56 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:47:57,388 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 07:48:24,193 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5145950.0, ans=0.04949747468305833 2024-08-21 07:48:29,433 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 07:48:43,129 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-08-21 07:48:56,336 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5146050.0, ans=0.07 2024-08-21 07:49:20,939 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10850, loss[loss=0.1161, beats_loss=0.009628, ecapa_loss=0.0001309, whisper_loss=0.1052, over 23254.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001365, whisper_loss=0.0898, over 3876099.77 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:49:35,931 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-21 07:49:39,880 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-21 07:49:46,859 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:49:48,699 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5146350.0, ans=0.125 2024-08-21 07:50:10,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5146450.0, ans=0.125 2024-08-21 07:50:23,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.344e+01 2.543e+01 2.878e+01 8.431e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 07:50:26,538 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5146550.0, ans=0.0 2024-08-21 07:50:33,496 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 07:50:52,687 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10900, loss[loss=0.1064, beats_loss=0.007401, ecapa_loss=0.0001594, whisper_loss=0.09738, over 19094.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001376, whisper_loss=0.09001, over 3896734.10 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:51:34,122 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2024-08-21 07:51:44,442 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5146950.0, ans=0.1 2024-08-21 07:51:47,127 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 07:51:52,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5147050.0, ans=0.1 2024-08-21 07:51:52,745 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-21 07:52:08,018 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-21 07:52:13,400 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=15.0 2024-08-21 07:52:15,059 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-21 07:52:23,183 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 10950, loss[loss=0.0916, beats_loss=0.01299, ecapa_loss=0.0001182, whisper_loss=0.07742, over 22204.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001379, whisper_loss=0.09005, over 3850585.74 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:52:34,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=5147250.0, ans=0.5 2024-08-21 07:52:35,513 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 33 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-21 07:52:42,099 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 07:52:58,526 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 07:52:59,105 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5147450.0, ans=0.125 2024-08-21 07:53:05,648 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 07:53:12,619 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 20 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 07:53:22,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.337e+01 2.519e+01 2.828e+01 1.066e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-21 07:53:28,637 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 07:53:42,373 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 07:53:52,847 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11000, loss[loss=0.0886, beats_loss=0.0134, ecapa_loss=0.0001011, whisper_loss=0.07419, over 17890.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.09022, over 3824275.11 frames. ], batch size: 72, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:54:00,063 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 07:54:03,081 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5147750.0, ans=0.2 2024-08-21 07:54:11,036 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 07:54:30,845 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5147950.0, ans=0.1 2024-08-21 07:54:32,726 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5147950.0, ans=0.125 2024-08-21 07:54:43,416 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-21 07:54:51,217 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-21 07:54:57,507 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 07:55:09,232 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 07:55:09,830 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5148150.0, ans=0.0 2024-08-21 07:55:16,925 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 07:55:21,764 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11050, loss[loss=0.1059, beats_loss=0.0127, ecapa_loss=9.245e-05, whisper_loss=0.09228, over 19756.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001376, whisper_loss=0.0906, over 3833990.83 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:55:22,232 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5148250.0, ans=0.09899494936611666 2024-08-21 07:55:23,828 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5148250.0, ans=0.125 2024-08-21 07:55:25,005 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 20 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 07:55:37,563 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5148350.0, ans=0.2 2024-08-21 07:55:46,524 INFO [train_multi_KD3.py:845] (2/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 07:56:01,576 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5148450.0, ans=0.0 2024-08-21 07:56:04,217 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 07:56:14,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5148550.0, ans=0.1 2024-08-21 07:56:19,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.275e+01 2.484e+01 2.790e+01 7.658e+01, threshold=4.968e+01, percent-clipped=1.0 2024-08-21 07:56:28,460 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 07:56:38,133 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 17 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 07:56:43,184 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 07:56:48,755 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11100, loss[loss=0.1006, beats_loss=0.008034, ecapa_loss=0.0001449, whisper_loss=0.09113, over 18596.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001383, whisper_loss=0.09029, over 3846470.88 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:56:52,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5148750.0, ans=0.1 2024-08-21 07:57:09,385 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2024-08-21 07:57:29,739 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 07:57:42,898 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5149050.0, ans=0.2 2024-08-21 07:57:52,382 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 07:57:53,250 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5149050.0, ans=0.5 2024-08-21 07:58:09,091 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5149150.0, ans=0.125 2024-08-21 07:58:18,681 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11150, loss[loss=0.1157, beats_loss=0.007857, ecapa_loss=0.0001115, whisper_loss=0.1068, over 13614.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.09075, over 3847665.08 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:58:31,882 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 07:58:53,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5149450.0, ans=0.0 2024-08-21 07:58:53,245 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5149450.0, ans=0.0 2024-08-21 07:58:54,705 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5149450.0, ans=0.125 2024-08-21 07:59:17,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.340e+01 2.550e+01 2.884e+01 1.372e+02, threshold=5.100e+01, percent-clipped=2.0 2024-08-21 07:59:29,671 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-08-21 07:59:46,176 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11200, loss[loss=0.1138, beats_loss=0.009571, ecapa_loss=0.000137, whisper_loss=0.1029, over 19611.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001379, whisper_loss=0.09046, over 3848181.88 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:59:58,139 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 08:00:26,591 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5149950.0, ans=0.2 2024-08-21 08:00:29,029 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 08:00:47,502 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5150050.0, ans=0.1 2024-08-21 08:01:14,013 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-21 08:01:16,535 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5150150.0, ans=0.2 2024-08-21 08:01:26,273 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 08:01:29,907 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11250, loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001199, whisper_loss=0.08991, over 21373.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001376, whisper_loss=0.0903, over 3825495.98 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:01:30,152 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 08:01:37,195 INFO [train_multi_KD3.py:845] (2/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 08:01:54,398 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 08:02:09,569 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5150350.0, ans=0.125 2024-08-21 08:02:10,181 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2024-08-21 08:02:14,341 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5150450.0, ans=0.0 2024-08-21 08:02:42,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.342e+01 2.612e+01 2.998e+01 2.607e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-21 08:02:49,440 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5150550.0, ans=0.0 2024-08-21 08:02:57,561 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5150650.0, ans=0.125 2024-08-21 08:03:15,126 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11300, loss[loss=0.1004, beats_loss=0.01295, ecapa_loss=0.000127, whisper_loss=0.08617, over 21737.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000138, whisper_loss=0.09021, over 3823198.57 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:03:18,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5150750.0, ans=0.1 2024-08-21 08:03:19,250 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 08:03:24,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5150750.0, ans=0.2 2024-08-21 08:03:29,604 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5150750.0, ans=0.125 2024-08-21 08:04:02,860 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-21 08:04:04,703 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5150950.0, ans=0.2 2024-08-21 08:04:05,815 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-21 08:04:25,457 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 38 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 08:04:33,903 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5151050.0, ans=0.0 2024-08-21 08:04:33,964 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5151050.0, ans=0.125 2024-08-21 08:04:34,158 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.20 vs. limit=10.0 2024-08-21 08:04:42,246 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 08:04:44,297 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 08:04:53,801 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5151250.0, ans=0.125 2024-08-21 08:04:54,571 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11350, loss[loss=0.09462, beats_loss=0.01207, ecapa_loss=0.0001333, whisper_loss=0.08122, over 21532.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.000137, whisper_loss=0.09002, over 3847912.89 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:05:09,814 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=12.0 2024-08-21 08:05:16,117 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 08:05:28,236 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 21 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 08:05:31,872 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 14 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-21 08:05:32,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5151450.0, ans=0.2 2024-08-21 08:05:45,746 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5151450.0, ans=0.125 2024-08-21 08:05:57,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.232e+01 2.527e+01 2.803e+01 3.759e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-21 08:05:57,698 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 30 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-21 08:05:58,128 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5151550.0, ans=0.125 2024-08-21 08:06:00,926 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 08:06:02,021 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5151550.0, ans=0.1 2024-08-21 08:06:04,732 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-21 08:06:09,066 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5151650.0, ans=0.125 2024-08-21 08:06:10,050 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 08:06:11,136 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-21 08:06:18,883 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5151650.0, ans=0.125 2024-08-21 08:06:26,843 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11400, loss[loss=0.09355, beats_loss=0.009976, ecapa_loss=0.000142, whisper_loss=0.08216, over 19597.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001364, whisper_loss=0.09026, over 3849735.76 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:06:44,816 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 33 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-21 08:07:19,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5151950.0, ans=0.0 2024-08-21 08:07:28,576 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=8.0 2024-08-21 08:07:33,859 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5152050.0, ans=0.125 2024-08-21 08:08:00,124 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-21 08:08:06,650 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11450, loss[loss=0.09726, beats_loss=0.00863, ecapa_loss=0.0001448, whisper_loss=0.08718, over 12573.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001368, whisper_loss=0.08931, over 3834542.10 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:08:07,886 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5152250.0, ans=0.1 2024-08-21 08:08:09,327 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5152250.0, ans=0.1 2024-08-21 08:08:31,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5152350.0, ans=0.2 2024-08-21 08:08:32,303 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5152350.0, ans=0.125 2024-08-21 08:09:00,986 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 08:09:14,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.353e+01 2.552e+01 2.800e+01 3.552e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-21 08:09:15,912 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=22.5 2024-08-21 08:09:21,026 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-21 08:09:23,593 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-21 08:09:29,276 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 30 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-21 08:09:46,859 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11500, loss[loss=0.1102, beats_loss=0.01155, ecapa_loss=0.0001239, whisper_loss=0.09745, over 22812.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001372, whisper_loss=0.08943, over 3833421.07 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:10:11,973 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=12.0 2024-08-21 08:10:21,935 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 08:10:28,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5152950.0, ans=0.125 2024-08-21 08:10:31,623 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 08:10:36,748 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5152950.0, ans=0.1 2024-08-21 08:11:00,246 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5153050.0, ans=0.2 2024-08-21 08:11:13,247 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5153150.0, ans=0.2 2024-08-21 08:11:17,989 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 08:11:23,285 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11550, loss[loss=0.1378, beats_loss=0.005698, ecapa_loss=0.0001549, whisper_loss=0.1306, over 19885.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001368, whisper_loss=0.08923, over 3811596.35 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:11:54,767 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5153350.0, ans=0.0 2024-08-21 08:12:09,006 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.67 vs. limit=6.0 2024-08-21 08:12:09,753 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-21 08:12:27,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.384e+01 2.690e+01 2.968e+01 5.018e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-21 08:12:38,576 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 32 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-21 08:12:50,151 INFO [train_multi_KD3.py:845] (2/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-21 08:12:54,721 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5153650.0, ans=0.125 2024-08-21 08:12:57,141 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11600, loss[loss=0.1263, beats_loss=0.007612, ecapa_loss=0.0001474, whisper_loss=0.1172, over 16651.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001364, whisper_loss=0.08934, over 3836280.34 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:13:02,513 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5153750.0, ans=0.125 2024-08-21 08:13:05,124 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5153750.0, ans=0.0 2024-08-21 08:13:24,798 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5153850.0, ans=0.0 2024-08-21 08:13:42,710 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5153950.0, ans=0.1 2024-08-21 08:13:50,582 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=5153950.0, ans=0.95 2024-08-21 08:13:58,297 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5154050.0, ans=0.0 2024-08-21 08:14:26,277 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=12.0 2024-08-21 08:14:35,529 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11650, loss[loss=0.1161, beats_loss=0.009352, ecapa_loss=0.0001426, whisper_loss=0.1054, over 17293.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001371, whisper_loss=0.08999, over 3831612.88 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:14:37,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5154250.0, ans=0.0 2024-08-21 08:14:59,750 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 08:15:06,193 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-21 08:15:21,005 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5154450.0, ans=0.125 2024-08-21 08:15:24,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5154450.0, ans=0.125 2024-08-21 08:15:33,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5154550.0, ans=0.125 2024-08-21 08:15:33,858 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5154550.0, ans=0.09899494936611666 2024-08-21 08:15:40,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.291e+01 2.549e+01 2.927e+01 7.915e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-21 08:15:44,092 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5154550.0, ans=0.1 2024-08-21 08:15:51,261 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 31 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 08:16:11,412 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11700, loss[loss=0.107, beats_loss=0.009939, ecapa_loss=0.0001335, whisper_loss=0.09572, over 22519.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01024, ecapa_loss=0.0001388, whisper_loss=0.09078, over 3818359.61 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:16:13,122 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 23 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-21 08:16:51,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5154950.0, ans=0.125 2024-08-21 08:16:54,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5154950.0, ans=0.0 2024-08-21 08:17:08,132 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5155050.0, ans=0.125 2024-08-21 08:17:09,831 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.59 vs. limit=22.5 2024-08-21 08:17:12,897 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 08:17:41,259 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11750, loss[loss=0.1285, beats_loss=0.007876, ecapa_loss=0.0001295, whisper_loss=0.1193, over 17178.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01022, ecapa_loss=0.0001389, whisper_loss=0.09117, over 3820513.30 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:17:43,981 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5155250.0, ans=0.0 2024-08-21 08:17:44,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5155250.0, ans=0.0 2024-08-21 08:17:45,372 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.395e+01 2024-08-21 08:18:00,552 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2024-08-21 08:18:14,279 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5155450.0, ans=0.125 2024-08-21 08:18:24,061 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 41 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 08:18:41,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.496e+01 2.849e+01 3.219e+01 3.241e+02, threshold=5.697e+01, percent-clipped=3.0 2024-08-21 08:19:07,367 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11800, loss[loss=0.1112, beats_loss=0.00712, ecapa_loss=0.0001812, whisper_loss=0.1023, over 18526.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0102, ecapa_loss=0.0001389, whisper_loss=0.09111, over 3809702.36 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:19:36,480 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5155850.0, ans=0.2 2024-08-21 08:19:43,391 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5155850.0, ans=0.125 2024-08-21 08:19:46,818 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-21 08:20:00,384 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5155950.0, ans=0.125 2024-08-21 08:20:19,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5156050.0, ans=0.125 2024-08-21 08:20:37,953 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5156150.0, ans=0.125 2024-08-21 08:20:55,112 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 08:20:57,602 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11850, loss[loss=0.1314, beats_loss=0.008113, ecapa_loss=0.000146, whisper_loss=0.1218, over 23120.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.000139, whisper_loss=0.09019, over 3773149.39 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:21:00,081 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-21 08:21:11,425 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-21 08:21:54,028 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 08:22:02,672 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5156450.0, ans=0.125 2024-08-21 08:22:06,592 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5156550.0, ans=0.125 2024-08-21 08:22:07,155 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-08-21 08:22:14,409 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5156550.0, ans=0.125 2024-08-21 08:22:17,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.264e+01 2.462e+01 2.768e+01 4.199e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-21 08:22:25,568 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 08:22:37,060 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5156650.0, ans=0.0 2024-08-21 08:22:39,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5156650.0, ans=0.1 2024-08-21 08:22:40,521 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 08:22:44,572 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 08:22:48,157 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11900, loss[loss=0.1025, beats_loss=0.009761, ecapa_loss=0.0001497, whisper_loss=0.09121, over 22760.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01024, ecapa_loss=0.0001396, whisper_loss=0.09013, over 3769618.84 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:22:48,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5156750.0, ans=0.125 2024-08-21 08:22:59,942 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-21 08:23:21,115 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5156850.0, ans=0.125 2024-08-21 08:23:24,102 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 08:23:34,232 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 08:24:18,127 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:24:26,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5157150.0, ans=0.125 2024-08-21 08:24:32,314 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 11950, loss[loss=0.1086, beats_loss=0.0115, ecapa_loss=0.0001191, whisper_loss=0.09587, over 19580.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001395, whisper_loss=0.09058, over 3755385.59 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:24:33,169 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5157250.0, ans=0.1 2024-08-21 08:25:08,748 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-21 08:25:14,473 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5157350.0, ans=0.07 2024-08-21 08:25:38,281 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5157550.0, ans=0.125 2024-08-21 08:25:38,306 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5157550.0, ans=0.1 2024-08-21 08:25:39,236 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 08:25:45,887 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2024-08-21 08:25:50,798 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.272e+01 2.506e+01 2.845e+01 4.517e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-21 08:26:19,023 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-21 08:26:27,513 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12000, loss[loss=0.1082, beats_loss=0.01069, ecapa_loss=0.0001472, whisper_loss=0.09599, over 21970.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001388, whisper_loss=0.09092, over 3794883.16 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:26:27,513 INFO [train_multi_KD3.py:1140] (2/4) Computing validation loss 2024-08-21 08:27:05,268 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on ASR_libri: loss=0.2549, beats_loss=0, ecapa_loss=0.0005016, whisper_loss=0.2499, over 931116.00 frames. 2024-08-21 08:27:23,744 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.1907, 2.8197, 3.1876, 1.8148, 2.1234, 2.1829, 3.0867, 3.0082], device='cuda:2') 2024-08-21 08:27:31,530 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on SV_voxceleb1: loss=0.00396, beats_loss=0, ecapa_loss=0.000396, whisper_loss=0, over 944235.00 frames. 2024-08-21 08:29:17,134 INFO [train_multi_KD3.py:1150] (2/4) Epoch 35, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 08:29:17,138 INFO [train_multi_KD3.py:1156] (2/4) Maximum memory allocated so far is 31859MB 2024-08-21 08:29:48,427 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 08:30:06,234 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 21 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-21 08:30:20,121 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-21 08:30:25,173 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 21 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 08:30:30,903 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 08:30:34,461 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-21 08:30:37,253 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5158150.0, ans=0.0 2024-08-21 08:30:42,946 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.591e+00 2024-08-21 08:30:46,799 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5158150.0, ans=0.0 2024-08-21 08:30:49,446 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12050, loss[loss=0.09846, beats_loss=0.00993, ecapa_loss=0.0001349, whisper_loss=0.08718, over 20246.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01025, ecapa_loss=0.0001383, whisper_loss=0.09154, over 3793239.48 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:31:29,871 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5158450.0, ans=0.1 2024-08-21 08:31:56,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.247e+01 2.409e+01 2.694e+01 3.930e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-21 08:32:04,808 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 21 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-21 08:32:18,679 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5158650.0, ans=0.0 2024-08-21 08:32:31,276 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12100, loss[loss=0.09125, beats_loss=0.01048, ecapa_loss=0.0001273, whisper_loss=0.0795, over 18757.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001382, whisper_loss=0.09177, over 3812364.17 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:32:58,723 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5158850.0, ans=0.1 2024-08-21 08:33:00,065 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-21 08:33:02,718 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5158850.0, ans=0.125 2024-08-21 08:33:06,318 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 08:33:08,479 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 08:33:35,011 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-21 08:33:50,636 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-21 08:33:51,550 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5159050.0, ans=0.1 2024-08-21 08:33:56,190 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5159050.0, ans=0.125 2024-08-21 08:34:23,626 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-08-21 08:34:24,136 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12150, loss[loss=0.1003, beats_loss=0.009093, ecapa_loss=0.0001445, whisper_loss=0.08978, over 22548.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001395, whisper_loss=0.09072, over 3818633.02 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:34:55,294 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5159350.0, ans=0.2 2024-08-21 08:35:12,595 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2024-08-21 08:35:17,471 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-21 08:35:28,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.279e+01 2.549e+01 2.837e+01 4.060e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-21 08:35:46,009 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 08:35:47,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=5159650.0, ans=0.025 2024-08-21 08:35:54,435 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12200, loss[loss=0.113, beats_loss=0.009219, ecapa_loss=0.000123, whisper_loss=0.1026, over 14421.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01025, ecapa_loss=0.00014, whisper_loss=0.09102, over 3773806.89 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:35:59,970 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5159750.0, ans=0.125 2024-08-21 08:36:03,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5159750.0, ans=0.125 2024-08-21 08:36:35,817 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5159950.0, ans=0.125 2024-08-21 08:36:43,683 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5159950.0, ans=0.0 2024-08-21 08:36:44,470 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09456772357225418, model_norm_threshold=50.97699737548828 2024-08-21 08:36:44,638 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.739e+04, grad_sumsq=4.739e+04, orig_rms_sq=1.000e+00 2024-08-21 08:37:00,653 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 27 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 08:37:08,046 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5160150.0, ans=0.125 2024-08-21 08:37:08,159 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5160150.0, ans=0.125 2024-08-21 08:37:17,175 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5160150.0, ans=0.125 2024-08-21 08:37:17,195 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5160150.0, ans=0.125 2024-08-21 08:37:23,092 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12250, loss[loss=0.1074, beats_loss=0.01038, ecapa_loss=0.0001414, whisper_loss=0.09558, over 22632.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01025, ecapa_loss=0.0001399, whisper_loss=0.09035, over 3742820.13 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:37:29,074 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5160250.0, ans=0.125 2024-08-21 08:37:40,720 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-08-21 08:38:03,773 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 16 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-21 08:38:09,145 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 10 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 08:38:12,901 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5160450.0, ans=0.125 2024-08-21 08:38:14,497 INFO [train_multi_KD3.py:845] (2/4) A total of 95 cuts. 28 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-21 08:38:20,459 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 26 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 08:38:25,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.259e+01 2.495e+01 2.764e+01 5.391e+02, threshold=4.989e+01, percent-clipped=1.0 2024-08-21 08:38:34,913 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5160650.0, ans=0.0 2024-08-21 08:38:52,664 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12300, loss[loss=0.1036, beats_loss=0.009119, ecapa_loss=0.0001363, whisper_loss=0.09314, over 23331.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001389, whisper_loss=0.08979, over 3762733.09 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:39:00,719 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5160750.0, ans=0.125 2024-08-21 08:39:02,096 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.67 vs. limit=6.0 2024-08-21 08:39:59,759 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-21 08:40:04,140 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5161050.0, ans=0.1 2024-08-21 08:40:26,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5161250.0, ans=0.125 2024-08-21 08:40:26,983 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12350, loss[loss=0.1035, beats_loss=0.00965, ecapa_loss=0.0001025, whisper_loss=0.09284, over 15000.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001378, whisper_loss=0.08918, over 3782152.14 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:40:31,733 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.76 vs. limit=10.0 2024-08-21 08:40:57,542 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 08:41:02,781 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 12 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 08:41:23,135 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 12 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 08:41:30,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.295e+01 2.536e+01 2.877e+01 1.914e+02, threshold=5.073e+01, percent-clipped=2.0 2024-08-21 08:41:30,478 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 08:41:32,200 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 08:41:40,679 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.82 vs. limit=6.0 2024-08-21 08:41:57,273 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-21 08:41:57,742 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12400, loss[loss=0.09985, beats_loss=0.009174, ecapa_loss=0.0001313, whisper_loss=0.08936, over 14440.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001363, whisper_loss=0.08946, over 3783158.24 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:42:54,815 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2024-08-21 08:43:04,602 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5162050.0, ans=0.125 2024-08-21 08:43:06,150 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5162050.0, ans=0.2 2024-08-21 08:43:14,598 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 08:43:20,553 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-21 08:43:29,018 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 08:43:44,342 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12450, loss[loss=0.09256, beats_loss=0.01125, ecapa_loss=0.0001422, whisper_loss=0.07989, over 19529.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.000136, whisper_loss=0.08964, over 3794556.11 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:43:50,307 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5162250.0, ans=0.125 2024-08-21 08:44:04,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5162350.0, ans=0.025 2024-08-21 08:44:07,181 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 17 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-21 08:44:42,508 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 08:44:44,548 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-21 08:44:54,795 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 08:44:56,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.293e+01 2.503e+01 2.840e+01 4.657e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 08:45:06,731 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5162650.0, ans=0.0 2024-08-21 08:45:17,218 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2024-08-21 08:45:27,619 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12500, loss[loss=0.1053, beats_loss=0.009762, ecapa_loss=0.0001304, whisper_loss=0.09427, over 17790.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001347, whisper_loss=0.08963, over 3836311.96 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:45:47,197 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5162850.0, ans=0.0 2024-08-21 08:45:51,510 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5162850.0, ans=0.125 2024-08-21 08:46:01,914 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 08:46:21,751 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=12.0 2024-08-21 08:46:31,722 INFO [train_multi_KD3.py:845] (2/4) A total of 50 cuts. 17 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-21 08:46:35,543 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-21 08:46:37,263 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2024-08-21 08:47:01,187 INFO [train_multi_KD3.py:845] (2/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 08:47:02,307 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:47:06,443 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5163150.0, ans=0.95 2024-08-21 08:47:10,057 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5163250.0, ans=0.0 2024-08-21 08:47:10,765 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12550, loss[loss=0.1125, beats_loss=0.008733, ecapa_loss=0.0001476, whisper_loss=0.1023, over 22194.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.000135, whisper_loss=0.09058, over 3832896.74 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:47:18,081 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-21 08:47:22,366 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-21 08:47:35,528 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5163350.0, ans=10.0 2024-08-21 08:47:35,632 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5163350.0, ans=0.2 2024-08-21 08:48:01,039 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5163450.0, ans=0.0 2024-08-21 08:48:08,161 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2024-08-21 08:48:10,992 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5163450.0, ans=0.1 2024-08-21 08:48:22,647 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 08:48:25,183 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 08:48:26,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.391e+01 2.623e+01 3.039e+01 4.282e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-21 08:48:41,828 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 08:48:56,022 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12600, loss[loss=0.0803, beats_loss=0.01202, ecapa_loss=0.0001281, whisper_loss=0.067, over 19956.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001357, whisper_loss=0.09063, over 3807473.26 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:48:56,505 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 25 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-21 08:49:02,560 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5163750.0, ans=0.07 2024-08-21 08:49:10,203 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2024-08-21 08:49:25,352 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-21 08:49:26,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5163850.0, ans=0.125 2024-08-21 08:49:30,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5163950.0, ans=0.015 2024-08-21 08:49:45,153 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 15 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-21 08:49:46,134 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5163950.0, ans=0.125 2024-08-21 08:49:58,554 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-21 08:50:27,511 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12650, loss[loss=0.08328, beats_loss=0.01049, ecapa_loss=0.0001363, whisper_loss=0.07143, over 19293.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001357, whisper_loss=0.09027, over 3772811.04 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:50:31,580 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 08:50:45,581 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5164350.0, ans=0.1 2024-08-21 08:50:55,633 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5164350.0, ans=0.1 2024-08-21 08:51:18,888 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5164450.0, ans=0.0 2024-08-21 08:51:20,681 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5164550.0, ans=0.0 2024-08-21 08:51:30,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.313e+01 2.541e+01 2.786e+01 6.490e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-21 08:51:31,531 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5164550.0, ans=0.0 2024-08-21 08:51:33,249 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 21 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-21 08:51:58,199 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12700, loss[loss=0.1017, beats_loss=0.009081, ecapa_loss=0.0001374, whisper_loss=0.09124, over 15424.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001358, whisper_loss=0.09017, over 3793496.93 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:52:23,761 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5164850.0, ans=0.09899494936611666 2024-08-21 08:52:41,031 INFO [train_multi_KD3.py:845] (2/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 08:52:43,541 INFO [train_multi_KD3.py:845] (2/4) A total of 65 cuts. 17 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-21 08:52:52,407 INFO [train_multi_KD3.py:845] (2/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 08:52:54,340 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 08:53:22,900 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 08:53:33,828 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0630233883857727, model_norm_threshold=50.820472717285156 2024-08-21 08:53:33,995 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.448e+05, grad_sumsq=2.272e+07, orig_rms_sq=1.077e-02 2024-08-21 08:53:37,270 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5165150.0, ans=0.1 2024-08-21 08:53:48,957 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12750, loss[loss=0.1245, beats_loss=0.007618, ecapa_loss=0.0001718, whisper_loss=0.1152, over 21575.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001365, whisper_loss=0.0896, over 3749084.37 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:54:13,017 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-21 08:54:16,710 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-21 08:54:21,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5165350.0, ans=0.0 2024-08-21 08:54:22,972 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5165350.0, ans=0.125 2024-08-21 08:54:40,826 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 24 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 08:54:51,777 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5165450.0, ans=0.0 2024-08-21 08:55:04,865 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5165550.0, ans=0.1 2024-08-21 08:55:07,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.308e+01 2.534e+01 2.848e+01 8.064e+02, threshold=5.067e+01, percent-clipped=1.0 2024-08-21 08:55:42,207 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5165750.0, ans=0.125 2024-08-21 08:55:43,315 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12800, loss[loss=0.1067, beats_loss=0.01149, ecapa_loss=0.0001227, whisper_loss=0.09402, over 20635.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001368, whisper_loss=0.08936, over 3767290.57 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:55:55,642 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.697e+05 2024-08-21 08:56:36,523 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5165950.0, ans=0.125 2024-08-21 08:56:44,094 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5165950.0, ans=0.1 2024-08-21 08:56:57,714 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5166050.0, ans=0.0 2024-08-21 08:57:00,414 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5166050.0, ans=0.0 2024-08-21 08:57:14,708 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5166150.0, ans=0.1 2024-08-21 08:57:16,609 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5166150.0, ans=0.0 2024-08-21 08:57:23,444 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5166150.0, ans=0.1 2024-08-21 08:57:35,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5166250.0, ans=0.0 2024-08-21 08:57:36,104 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12850, loss[loss=0.0935, beats_loss=0.013, ecapa_loss=0.0001122, whisper_loss=0.07938, over 20145.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001376, whisper_loss=0.08955, over 3766263.20 frames. ], batch size: 81, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:57:50,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=5166250.0, ans=0.025 2024-08-21 08:57:53,374 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5166250.0, ans=0.04949747468305833 2024-08-21 08:58:23,231 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5166350.0, ans=0.125 2024-08-21 08:58:28,284 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5166450.0, ans=0.0 2024-08-21 08:58:32,053 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-21 08:58:56,498 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 37 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 08:59:02,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.226e+01 2.436e+01 2.740e+01 3.525e+01, threshold=4.872e+01, percent-clipped=0.0 2024-08-21 08:59:36,018 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12900, loss[loss=0.07923, beats_loss=0.01055, ecapa_loss=0.0001599, whisper_loss=0.06708, over 14350.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.09001, over 3767494.58 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:59:44,265 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-21 09:00:18,380 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5166850.0, ans=0.125 2024-08-21 09:00:44,402 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 09:00:47,608 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5167050.0, ans=0.1 2024-08-21 09:00:56,166 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5167050.0, ans=0.125 2024-08-21 09:01:19,097 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5167150.0, ans=0.0 2024-08-21 09:01:41,508 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 12950, loss[loss=0.1182, beats_loss=0.0101, ecapa_loss=0.0001291, whisper_loss=0.1068, over 20830.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001371, whisper_loss=0.09041, over 3774264.74 frames. ], batch size: 80, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:01:57,759 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5167250.0, ans=0.125 2024-08-21 09:03:06,427 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 09:03:14,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.293e+01 2.527e+01 2.916e+01 2.821e+02, threshold=5.054e+01, percent-clipped=3.0 2024-08-21 09:03:18,384 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:03:53,540 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13000, loss[loss=0.09251, beats_loss=0.01122, ecapa_loss=0.0001652, whisper_loss=0.07964, over 13843.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001374, whisper_loss=0.0903, over 3813068.10 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:03:55,369 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5167750.0, ans=0.0 2024-08-21 09:03:55,427 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5167750.0, ans=0.125 2024-08-21 09:04:21,907 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 09:04:25,168 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5167850.0, ans=0.0 2024-08-21 09:04:45,109 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5167950.0, ans=0.1 2024-08-21 09:04:53,755 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5167950.0, ans=0.125 2024-08-21 09:05:25,383 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5168050.0, ans=0.125 2024-08-21 09:05:28,934 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 09:05:47,087 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13050, loss[loss=0.09904, beats_loss=0.009762, ecapa_loss=0.0001564, whisper_loss=0.08772, over 20740.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001391, whisper_loss=0.09, over 3828675.25 frames. ], batch size: 86, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:06:13,394 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 09:06:17,243 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.221e+01 2024-08-21 09:06:49,643 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5168550.0, ans=0.125 2024-08-21 09:06:53,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.244e+01 2.565e+01 2.822e+01 8.760e+01, threshold=5.130e+01, percent-clipped=2.0 2024-08-21 09:06:57,416 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-21 09:07:26,639 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13100, loss[loss=0.1041, beats_loss=0.0111, ecapa_loss=0.0001025, whisper_loss=0.09196, over 15712.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.0001395, whisper_loss=0.08867, over 3775723.22 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:07:32,931 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 40 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 09:07:35,509 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-21 09:07:42,575 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5168750.0, ans=0.1 2024-08-21 09:07:49,376 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 09:08:07,001 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5168850.0, ans=0.0 2024-08-21 09:08:13,163 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 09:08:44,171 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5169050.0, ans=0.1 2024-08-21 09:09:31,227 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13150, loss[loss=0.1041, beats_loss=0.009135, ecapa_loss=0.0001819, whisper_loss=0.09315, over 21173.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001396, whisper_loss=0.08892, over 3776309.17 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:09:33,092 INFO [train_multi_KD3.py:845] (2/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 09:09:36,796 WARNING [optim.py:496] (2/4) Scaling gradients by 0.0913950502872467, model_norm_threshold=51.30171203613281 2024-08-21 09:09:36,965 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.527e+05, grad_sumsq=4.631e+04, orig_rms_sq=3.298e+00 2024-08-21 09:10:15,699 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.83 vs. limit=22.5 2024-08-21 09:10:28,915 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5169450.0, ans=0.0 2024-08-21 09:10:59,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.235e+01 2.478e+01 2.768e+01 5.613e+02, threshold=4.956e+01, percent-clipped=2.0 2024-08-21 09:11:23,493 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5169650.0, ans=0.125 2024-08-21 09:11:31,983 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5169650.0, ans=0.125 2024-08-21 09:11:37,589 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13200, loss[loss=0.1118, beats_loss=0.008169, ecapa_loss=0.0001573, whisper_loss=0.102, over 23789.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01048, ecapa_loss=0.000139, whisper_loss=0.08872, over 3770718.58 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:11:44,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5169750.0, ans=0.125 2024-08-21 09:12:17,964 INFO [train_multi_KD3.py:845] (2/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 09:12:18,278 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5169850.0, ans=0.1 2024-08-21 09:13:16,062 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.997e+00 2024-08-21 09:13:35,867 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5170150.0, ans=0.2 2024-08-21 09:13:41,448 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13250, loss[loss=0.101, beats_loss=0.01201, ecapa_loss=0.0001392, whisper_loss=0.08763, over 22714.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.0885, over 3749329.44 frames. ], batch size: 94, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:14:46,243 INFO [train_multi_KD3.py:845] (2/4) A total of 79 cuts. 34 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 09:15:03,680 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5170550.0, ans=0.125 2024-08-21 09:15:16,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.287e+01 2.552e+01 2.926e+01 1.195e+02, threshold=5.104e+01, percent-clipped=1.0 2024-08-21 09:15:27,543 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5170650.0, ans=0.125 2024-08-21 09:15:27,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5170650.0, ans=0.1 2024-08-21 09:15:46,002 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 09:15:50,382 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-21 09:15:53,596 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13300, loss[loss=0.1033, beats_loss=0.01216, ecapa_loss=0.0001492, whisper_loss=0.08963, over 22361.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01047, ecapa_loss=0.0001391, whisper_loss=0.08854, over 3748844.07 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:16:11,867 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 09:16:35,107 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5170850.0, ans=0.125 2024-08-21 09:16:39,735 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5170850.0, ans=0.0 2024-08-21 09:16:41,905 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 41 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-21 09:16:55,937 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-21 09:17:16,149 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5171050.0, ans=0.2 2024-08-21 09:17:21,777 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 34 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-21 09:17:24,006 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-21 09:17:36,475 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:17:53,267 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5171150.0, ans=0.0 2024-08-21 09:17:57,133 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 10 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-21 09:17:58,194 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13350, loss[loss=0.06281, beats_loss=0.01162, ecapa_loss=0.0001533, whisper_loss=0.04965, over 14140.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01054, ecapa_loss=0.0001377, whisper_loss=0.08827, over 3759076.50 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:18:04,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=12.0 2024-08-21 09:18:12,276 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5171250.0, ans=0.125 2024-08-21 09:18:33,275 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5171350.0, ans=0.1 2024-08-21 09:19:14,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5171550.0, ans=0.2 2024-08-21 09:19:23,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.340e+01 2.564e+01 2.896e+01 2.938e+02, threshold=5.128e+01, percent-clipped=2.0 2024-08-21 09:19:38,751 WARNING [optim.py:496] (2/4) Scaling gradients by 0.09627310186624527, model_norm_threshold=51.2801628112793 2024-08-21 09:19:38,916 INFO [optim.py:564] (2/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.670e+04, grad_sumsq=3.670e+04, orig_rms_sq=1.000e+00 2024-08-21 09:19:52,166 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-21 09:19:59,574 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13400, loss[loss=0.1057, beats_loss=0.008925, ecapa_loss=0.0001453, whisper_loss=0.09533, over 12847.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01065, ecapa_loss=0.0001368, whisper_loss=0.08791, over 3762207.55 frames. ], batch size: 51, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:20:20,392 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-21 09:20:27,379 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 09:20:32,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2024-08-21 09:20:35,491 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5171850.0, ans=0.04949747468305833 2024-08-21 09:20:50,798 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 09:21:00,353 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 09:21:18,172 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-21 09:22:00,278 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-08-21 09:22:00,640 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13450, loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.000153, whisper_loss=0.08993, over 22297.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01056, ecapa_loss=0.0001377, whisper_loss=0.08856, over 3767970.29 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:22:20,160 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5172350.0, ans=0.2 2024-08-21 09:22:22,544 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5172350.0, ans=0.125 2024-08-21 09:22:30,268 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5172350.0, ans=0.2 2024-08-21 09:23:05,785 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:23:19,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.314e+01 2.499e+01 2.868e+01 5.327e+02, threshold=4.997e+01, percent-clipped=2.0 2024-08-21 09:23:21,054 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5172550.0, ans=0.1 2024-08-21 09:23:28,717 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2024-08-21 09:23:54,383 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13500, loss[loss=0.1155, beats_loss=0.009196, ecapa_loss=0.0001566, whisper_loss=0.1047, over 16983.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01058, ecapa_loss=0.0001376, whisper_loss=0.08843, over 3786684.47 frames. ], batch size: 67, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:24:03,877 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-21 09:24:07,480 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-21 09:24:10,080 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5172750.0, ans=0.0 2024-08-21 09:24:14,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5172750.0, ans=0.0 2024-08-21 09:24:35,517 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5172850.0, ans=0.0 2024-08-21 09:24:37,442 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 09:25:04,842 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5173050.0, ans=0.2 2024-08-21 09:25:11,725 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5173050.0, ans=0.0 2024-08-21 09:25:11,739 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5173050.0, ans=0.0 2024-08-21 09:25:21,153 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 09:25:35,482 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5173150.0, ans=0.125 2024-08-21 09:25:52,233 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13550, loss[loss=0.1103, beats_loss=0.01078, ecapa_loss=0.0001167, whisper_loss=0.0983, over 17815.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001376, whisper_loss=0.08922, over 3806969.41 frames. ], batch size: 70, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:26:06,020 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-21 09:26:08,403 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5173250.0, ans=0.125 2024-08-21 09:26:22,416 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 09:26:45,902 INFO [train_multi_KD3.py:845] (2/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 09:27:01,524 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 09:27:01,789 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5173450.0, ans=0.125 2024-08-21 09:27:16,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.212e+01 2.430e+01 2.813e+01 4.061e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:27:35,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5173650.0, ans=0.2 2024-08-21 09:27:53,918 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13600, loss[loss=0.08312, beats_loss=0.01095, ecapa_loss=0.0001417, whisper_loss=0.07076, over 14587.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001379, whisper_loss=0.08972, over 3781247.32 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:27:55,874 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-21 09:28:53,735 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 09:28:56,184 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 09:29:04,325 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-21 09:29:06,295 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 09:29:16,084 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5174050.0, ans=0.05 2024-08-21 09:29:46,065 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 20 from LS+wenet, 19 from Vox, 13 fro AS 2024-08-21 09:29:53,541 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-21 09:29:56,138 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 09:29:59,837 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13650, loss[loss=0.08416, beats_loss=0.01266, ecapa_loss=9.826e-05, whisper_loss=0.07052, over 14808.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001391, whisper_loss=0.0896, over 3794160.54 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:30:18,074 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 09:30:30,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5174350.0, ans=0.1 2024-08-21 09:30:34,194 INFO [train_multi_KD3.py:845] (2/4) A total of 52 cuts. 16 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 09:30:49,356 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5174450.0, ans=0.05 2024-08-21 09:31:07,407 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-08-21 09:31:17,167 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 09:31:20,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=5174550.0, ans=0.2 2024-08-21 09:31:26,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.278e+01 2.472e+01 2.664e+01 8.830e+01, threshold=4.945e+01, percent-clipped=1.0 2024-08-21 09:31:50,367 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 09:32:04,222 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13700, loss[loss=0.1091, beats_loss=0.01287, ecapa_loss=0.0001149, whisper_loss=0.09512, over 21946.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0105, ecapa_loss=0.0001387, whisper_loss=0.08897, over 3769123.32 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:32:45,545 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5174850.0, ans=0.1 2024-08-21 09:33:29,172 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=12.0 2024-08-21 09:33:33,859 INFO [train_multi_KD3.py:845] (2/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 09:33:36,738 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5175050.0, ans=0.125 2024-08-21 09:33:36,948 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-21 09:33:57,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5175150.0, ans=0.125 2024-08-21 09:34:04,755 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13750, loss[loss=0.0874, beats_loss=0.01166, ecapa_loss=0.0001405, whisper_loss=0.07433, over 16005.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001387, whisper_loss=0.08966, over 3808831.34 frames. ], batch size: 68, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:34:13,867 INFO [train_multi_KD3.py:845] (2/4) A total of 60 cuts. 15 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 09:34:22,327 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 09:34:47,791 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5175350.0, ans=0.07 2024-08-21 09:34:57,119 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5175450.0, ans=0.125 2024-08-21 09:35:20,384 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2024-08-21 09:35:27,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.292e+01 2.541e+01 2.786e+01 7.539e+01, threshold=5.082e+01, percent-clipped=3.0 2024-08-21 09:35:31,489 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5175550.0, ans=0.1 2024-08-21 09:35:34,692 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5175550.0, ans=0.125 2024-08-21 09:35:55,624 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.331e+00 2024-08-21 09:36:03,844 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5175650.0, ans=0.125 2024-08-21 09:36:06,927 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13800, loss[loss=0.09929, beats_loss=0.009983, ecapa_loss=0.0001032, whisper_loss=0.08827, over 16384.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001386, whisper_loss=0.08938, over 3793893.04 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:36:07,910 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5175750.0, ans=0.2 2024-08-21 09:36:41,372 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5175850.0, ans=0.125 2024-08-21 09:36:41,373 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5175850.0, ans=0.0 2024-08-21 09:37:59,970 INFO [train_multi_KD3.py:845] (2/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 09:38:21,660 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13850, loss[loss=0.1053, beats_loss=0.00893, ecapa_loss=0.0001457, whisper_loss=0.09489, over 18158.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001374, whisper_loss=0.08946, over 3821616.05 frames. ], batch size: 71, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:38:26,077 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-21 09:38:29,288 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2024-08-21 09:38:47,388 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5176350.0, ans=0.0 2024-08-21 09:38:50,210 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 09:39:12,055 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5176350.0, ans=0.1 2024-08-21 09:39:22,299 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-21 09:39:46,816 INFO [train_multi_KD3.py:845] (2/4) A total of 83 cuts. 15 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 09:39:54,762 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5176550.0, ans=0.0 2024-08-21 09:39:58,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.312e+01 2.430e+01 2.795e+01 3.774e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:40:29,819 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 22 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-21 09:40:33,017 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13900, loss[loss=0.09856, beats_loss=0.011, ecapa_loss=0.0001634, whisper_loss=0.08593, over 15326.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001371, whisper_loss=0.08998, over 3851051.07 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:40:50,900 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5176750.0, ans=0.125 2024-08-21 09:40:51,317 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-21 09:41:13,998 INFO [train_multi_KD3.py:845] (2/4) A total of 54 cuts. 11 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 09:41:42,911 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-21 09:41:48,037 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5177050.0, ans=0.0 2024-08-21 09:41:50,736 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 09:42:17,143 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 09:42:31,826 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 13950, loss[loss=0.0965, beats_loss=0.01254, ecapa_loss=0.0001217, whisper_loss=0.08275, over 20656.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001384, whisper_loss=0.0899, over 3820287.05 frames. ], batch size: 84, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:43:03,857 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5177350.0, ans=0.2 2024-08-21 09:43:05,466 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5177350.0, ans=0.0 2024-08-21 09:43:16,166 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 15 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-21 09:43:23,643 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=12.0 2024-08-21 09:43:27,820 INFO [train_multi_KD3.py:845] (2/4) A total of 94 cuts. 37 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 09:43:43,779 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5177550.0, ans=0.0 2024-08-21 09:43:58,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.301e+01 2.643e+01 2.947e+01 4.607e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-21 09:44:03,935 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-21 09:44:31,669 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14000, loss[loss=0.1014, beats_loss=0.009692, ecapa_loss=0.000145, whisper_loss=0.09029, over 13027.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001377, whisper_loss=0.08964, over 3796040.93 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:44:54,504 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 23 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-21 09:44:58,704 INFO [train_multi_KD3.py:845] (2/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 09:45:10,985 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5177850.0, ans=0.1 2024-08-21 09:45:13,052 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 09:45:13,194 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5177850.0, ans=0.125 2024-08-21 09:45:13,328 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.862e+01 2024-08-21 09:45:15,677 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5177950.0, ans=0.125 2024-08-21 09:45:17,177 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5177950.0, ans=0.2 2024-08-21 09:45:26,031 INFO [train_multi_KD3.py:845] (2/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 09:46:23,931 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14050, loss[loss=0.09971, beats_loss=0.01027, ecapa_loss=0.0001442, whisper_loss=0.088, over 22699.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001372, whisper_loss=0.08997, over 3787630.48 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:46:30,851 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5178250.0, ans=0.125 2024-08-21 09:46:40,963 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 09:47:07,050 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 23 from LS+wenet, 7 from Vox, 37 fro AS 2024-08-21 09:47:43,658 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=5178550.0, ans=0.02 2024-08-21 09:47:48,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.318e+01 2.539e+01 2.809e+01 4.112e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-21 09:47:58,474 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5178650.0, ans=0.125 2024-08-21 09:48:07,670 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=5178650.0, ans=10.0 2024-08-21 09:48:16,453 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14100, loss[loss=0.09851, beats_loss=0.008934, ecapa_loss=0.0001455, whisper_loss=0.08812, over 20132.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001369, whisper_loss=0.09049, over 3786442.08 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:48:27,117 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5178750.0, ans=0.125 2024-08-21 09:48:28,500 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5178750.0, ans=0.2 2024-08-21 09:48:48,728 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5178850.0, ans=0.125 2024-08-21 09:49:22,357 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 09:49:29,282 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 09:49:29,542 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5179050.0, ans=0.0 2024-08-21 09:49:36,775 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5179150.0, ans=0.125 2024-08-21 09:50:01,591 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14150, loss[loss=0.1195, beats_loss=0.008462, ecapa_loss=0.0001407, whisper_loss=0.1097, over 21728.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001372, whisper_loss=0.08996, over 3774869.93 frames. ], batch size: 84, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:50:13,953 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 28 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 09:50:23,200 INFO [train_multi_KD3.py:845] (2/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 09:50:38,690 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 09:51:19,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.247e+01 2.512e+01 2.809e+01 5.073e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 09:51:37,438 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5179650.0, ans=0.125 2024-08-21 09:51:52,974 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14200, loss[loss=0.07731, beats_loss=0.01138, ecapa_loss=0.0001127, whisper_loss=0.0648, over 13402.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001373, whisper_loss=0.09016, over 3742684.87 frames. ], batch size: 50, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:51:54,022 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5179750.0, ans=0.1 2024-08-21 09:52:11,034 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5179750.0, ans=0.0 2024-08-21 09:52:11,600 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-21 09:52:28,201 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5179850.0, ans=0.0 2024-08-21 09:52:49,204 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.85 vs. limit=15.0 2024-08-21 09:53:11,108 INFO [train_multi_KD3.py:845] (2/4) A total of 72 cuts. 17 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-21 09:53:22,548 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5180050.0, ans=0.1 2024-08-21 09:53:35,861 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5180150.0, ans=0.5 2024-08-21 09:53:55,777 INFO [train_multi_KD3.py:845] (2/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 09:53:56,777 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14250, loss[loss=0.1255, beats_loss=0.008086, ecapa_loss=0.0001504, whisper_loss=0.1159, over 17594.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001366, whisper_loss=0.08986, over 3778549.25 frames. ], batch size: 71, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:54:20,216 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5180350.0, ans=0.125 2024-08-21 09:54:29,269 INFO [train_multi_KD3.py:845] (2/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-21 09:54:34,902 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5180350.0, ans=0.1 2024-08-21 09:54:47,883 INFO [train_multi_KD3.py:845] (2/4) A total of 64 cuts. 15 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-21 09:54:55,961 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5180450.0, ans=0.0 2024-08-21 09:55:01,788 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5180450.0, ans=0.2 2024-08-21 09:55:26,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.191e+01 2.440e+01 2.689e+01 6.038e+01, threshold=4.881e+01, percent-clipped=1.0 2024-08-21 09:55:53,893 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 09:55:54,199 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5180650.0, ans=10.0 2024-08-21 09:56:03,532 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14300, loss[loss=0.1045, beats_loss=0.008827, ecapa_loss=0.0001326, whisper_loss=0.09438, over 18916.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001355, whisper_loss=0.08941, over 3790350.57 frames. ], batch size: 72, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:56:15,175 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-08-21 09:56:26,899 INFO [train_multi_KD3.py:845] (2/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 09:56:29,774 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5180850.0, ans=0.0 2024-08-21 09:56:55,656 INFO [train_multi_KD3.py:845] (2/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-21 09:57:21,242 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5181050.0, ans=0.0 2024-08-21 09:57:42,978 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5181050.0, ans=0.125 2024-08-21 09:57:44,811 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 15 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-21 09:57:52,049 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-21 09:58:01,483 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5181150.0, ans=0.125 2024-08-21 09:58:09,809 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14350, loss[loss=0.08957, beats_loss=0.01214, ecapa_loss=0.0001218, whisper_loss=0.07621, over 16950.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001368, whisper_loss=0.08939, over 3812252.04 frames. ], batch size: 67, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:58:29,596 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5181250.0, ans=0.5 2024-08-21 09:59:21,998 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 09:59:30,888 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-08-21 09:59:33,366 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5181550.0, ans=0.025 2024-08-21 09:59:43,850 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5181550.0, ans=0.07 2024-08-21 09:59:44,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.249e+01 2.480e+01 2.767e+01 3.884e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-21 10:00:08,342 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 10:00:10,432 INFO [train_multi_KD3.py:845] (2/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 10:00:19,258 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14400, loss[loss=0.1013, beats_loss=0.01169, ecapa_loss=0.0001416, whisper_loss=0.08822, over 22190.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001359, whisper_loss=0.08949, over 3797054.88 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:00:37,173 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=12.0 2024-08-21 10:00:51,683 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-21 10:00:53,906 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5181850.0, ans=0.125 2024-08-21 10:02:00,965 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5182150.0, ans=0.04949747468305833 2024-08-21 10:02:27,949 INFO [train_multi_KD3.py:845] (2/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 10:02:28,999 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14450, loss[loss=0.1017, beats_loss=0.009823, ecapa_loss=0.000118, whisper_loss=0.09073, over 14948.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001366, whisper_loss=0.0888, over 3753817.37 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:02:33,672 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-21 10:03:07,291 INFO [train_multi_KD3.py:845] (2/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 10:03:41,011 INFO [train_multi_KD3.py:845] (2/4) A total of 69 cuts. 24 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-21 10:04:02,413 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5182550.0, ans=0.1 2024-08-21 10:04:03,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.252e+01 2.493e+01 2.789e+01 4.722e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-21 10:04:13,017 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-21 10:04:13,142 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2024-08-21 10:04:36,974 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-21 10:04:40,406 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14500, loss[loss=0.08525, beats_loss=0.01072, ecapa_loss=0.0001732, whisper_loss=0.0728, over 19685.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01038, ecapa_loss=0.0001372, whisper_loss=0.08844, over 3757398.65 frames. ], batch size: 82, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:04:47,588 INFO [train_multi_KD3.py:845] (2/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 10:05:00,633 INFO [train_multi_KD3.py:845] (2/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 10:05:38,049 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5182950.0, ans=0.125 2024-08-21 10:06:46,947 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5183150.0, ans=0.125 2024-08-21 10:06:50,541 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14550, loss[loss=0.08537, beats_loss=0.01063, ecapa_loss=0.0001087, whisper_loss=0.07366, over 18605.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01025, ecapa_loss=0.000139, whisper_loss=0.08948, over 3767347.77 frames. ], batch size: 72, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:07:31,682 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5183350.0, ans=0.125 2024-08-21 10:07:42,804 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.42 vs. limit=22.5 2024-08-21 10:07:55,340 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5183450.0, ans=0.0 2024-08-21 10:07:55,547 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-21 10:07:57,834 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5183450.0, ans=0.1 2024-08-21 10:08:25,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.316e+01 2.552e+01 2.879e+01 5.154e+01, threshold=5.103e+01, percent-clipped=1.0 2024-08-21 10:08:44,939 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 10:08:53,926 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-21 10:08:53,926 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=5183650.0, ans=10.0 2024-08-21 10:09:01,748 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14600, loss[loss=0.1095, beats_loss=0.01024, ecapa_loss=0.0001149, whisper_loss=0.09815, over 17100.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.08898, over 3781264.78 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:09:02,396 INFO [train_multi_KD3.py:845] (2/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 10:09:24,373 INFO [train_multi_KD3.py:845] (2/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 10:09:30,341 INFO [scaling.py:1120] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 10:10:03,920 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5183950.0, ans=0.125 2024-08-21 10:10:42,753 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-21 10:10:52,328 INFO [train_multi_KD3.py:845] (2/4) A total of 91 cuts. 17 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-21 10:10:58,481 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5184150.0, ans=0.1 2024-08-21 10:11:02,599 INFO [train_multi_KD3.py:845] (2/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 10:11:03,540 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14650, loss[loss=0.1053, beats_loss=0.01078, ecapa_loss=0.0001532, whisper_loss=0.09299, over 21826.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01035, ecapa_loss=0.0001393, whisper_loss=0.08848, over 3779220.11 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:11:13,282 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5184250.0, ans=0.1 2024-08-21 10:11:40,170 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5184350.0, ans=0.0 2024-08-21 10:11:49,054 INFO [train_multi_KD3.py:845] (2/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-21 10:12:13,790 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5184550.0, ans=0.1 2024-08-21 10:12:24,008 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.43 vs. limit=10.0 2024-08-21 10:12:26,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.280e+01 2.543e+01 2.836e+01 3.661e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-21 10:13:01,683 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14700, loss[loss=0.09171, beats_loss=0.009862, ecapa_loss=0.0001304, whisper_loss=0.08055, over 21984.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01029, ecapa_loss=0.0001413, whisper_loss=0.08951, over 3807682.43 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:13:11,191 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-21 10:13:48,648 INFO [train_multi_KD3.py:845] (2/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 10:14:33,769 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5185050.0, ans=0.125 2024-08-21 10:14:36,888 INFO [train_multi_KD3.py:845] (2/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 10:15:04,798 INFO [train_multi_KD3.py:1117] (2/4) Epoch 35, batch 14750, loss[loss=0.07841, beats_loss=0.01293, ecapa_loss=0.0001391, whisper_loss=0.06408, over 12509.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001404, whisper_loss=0.08939, over 3781333.77 frames. ], batch size: 49, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:15:06,506 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5185250.0, ans=0.1 2024-08-21 10:15:09,470 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5185250.0, ans=0.125 2024-08-21 10:15:19,523 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 10:15:25,968 INFO [scaling.py:1024] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-21 10:15:33,459 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5185350.0, ans=0.0 2024-08-21 10:15:39,721 INFO [scaling.py:1024] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.95 vs. limit=5.0 2024-08-21 10:15:55,324 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5185350.0, ans=0.125 2024-08-21 10:16:03,322 INFO [train_multi_KD3.py:845] (2/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 10:16:06,381 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5185450.0, ans=0.0 2024-08-21 10:16:10,386 INFO [train_multi_KD3.py:845] (2/4) A total of 80 cuts. 28 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-21 10:16:18,319 INFO [train_multi_KD3.py:845] (2/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-21 10:16:21,073 INFO [scaling.py:214] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5185450.0, ans=0.0 2024-08-21 10:16:39,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.269e+01 2.538e+01 2.783e+01 3.650e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-21 10:16:41,477 INFO [train_multi_KD3.py:845] (2/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 10:16:57,310 INFO [train_multi_KD3.py:1466] (2/4) Done!